3

Let $\overline{X}_k$ and $S_k^2$ denote the sample mean and sample variance based on the first $k$ observations. Then establish the following $$n S_{n+1}^2 = (n-1)S_n^2 + \big( \frac{n}{n+1}\big) (X_{n+1} - \overline{X}_n)^2.$$

I am trying to use the relation $$\overline{X}_{n+1} = \frac{X_{n+1} + n\overline{X}_n}{n+1}.$$

I tried to decompose the sum into the first $n$ and the last $n+1$th term, however I get $$n S_{n+1}^2 = \sum_{i=1}^{n+1} (X_i - \overline{X}_{n+1})^2 = \sum_{i=1}^n (X_i - \frac{X_{n+1} + n \overline{X}_n}{n+1})^2 + (\frac{n}{n+1} X_{n+1} - \overline{X}_n)^2,$$ which has the square of $\frac{n}{n+1}$. How can I get this relation? I would greatly appreciate some help.

  • I changed some things like $\overline{X_{n+1}}$ to things like $\overline{X}_{n+1}. \qquad$ – Michael Hardy Jul 01 '18 at 18:46
  • @takecare: I don't have any helpful hints, but I've noticed a slight difference in the indices in the second term on the RHS of both of your equations. – SOULed_Outt Jul 01 '18 at 18:49
  • 1
    Does this help: https://math.stackexchange.com/questions/711135/derivation-of-runningonline-variances-formula/712036#712036 – gammatester Jul 01 '18 at 18:50
  • @takecare: Should the second term be $(X_n - \bar{X}{n-1})^2$ or $(X{n+1} - \bar{X}_{n})^2$ – SOULed_Outt Jul 01 '18 at 18:50

1 Answers1

3

The key is to understand how to partition the sums of squares. Consider a sequence of observations $\{x_i\}_{i \ge 1}$ and define from this sequence $\{\bar x_n\}_{n \ge 1}$ where $\bar x_n = \frac{1}{n} \sum_{i=1}^n x_i$. Furthermore, define the function $$\begin{align*} f_n(\delta) &= \sum_{i=1}^n (x_i - \delta)^2 \\ &= \sum_{i=1}^n (x_i - \delta' + \delta' - \delta)^2 \\ &= \sum_{i=1}^n (x_i - \delta')^2 + 2(x_i - \delta')(\delta' - \delta) + (\delta' - \delta)^2 \\ &= f_n(\delta') + 2(\delta' - \delta) \sum_{i=1}^n (x_i - \delta') + n(\delta' - \delta)^2 \\ & = f_n(\delta') + 2(\delta' - \delta) (n\bar x_n - n\delta') + n(\delta' - \delta)^2 \\ &= f_n(\delta') + n(\delta'-\delta)(2(\bar x_n - \delta') + (\delta' - \delta)) \\ &= f_n(\delta') + n(\delta'-\delta)(2\bar x_n - \delta' - \delta). \end{align*}$$ So here we have a general relationship between the sum of squared deviations of $\{x_i\}_{i = 1}^n$ from some number $\delta$ and another number $\delta'$, and we are free to choose these values as we please. So if we choose $\delta = \bar x_{n+1}$, and $\delta' = \bar x_n$, we obtain $$f_n(\bar x_{n+1}) = f_n(\bar x_n) + n(\bar x_n - \bar x_{n+1})(2\bar x_n - \bar x_n - \bar x_{n+1}) = f_n(\bar x_n) + n(\bar x_n - \bar x_{n+1})^2.$$ But $$(n-1)S_n^2 = f_n(\bar x_n),$$ and $$f_n(\bar x_{n+1}) = f_{n+1}(\bar x_{n+1}) - (x_{n+1} - \bar x_{n+1})^2 = n S_{n+1}^2 - (x_{n+1} - \bar x_{n+1})^2,$$ so $$n S_{n+1}^2 = (n-1)S_n^2 + n(\bar x_n - \bar x_{n+1})^2 + (x_{n+1} - \bar x_{n+1})^2.$$ All that is left is to observe $$\bar x_{n+1} = \frac{n \bar x_n + x_{n+1}}{n+1}$$ and simplify.

heropup
  • 143,828