1

I'm trying to show that SSE and SSR are independent (conditionally on X) but I have to use the following steps/hint. [Hint: Notice you have to consider SSE and SSR as random variables, so be careful how you define them. You may want to use the result that two linear forms U = AX and V = BX, with A and B being constant matrices and X is Normal, are independent iff Cov(U, V) = 0].

I know that the question was posted before but I'm not finding how to prove it using this hint.

Any help is much appreciated!

Elio
  • 135
  • This isn't actually discrete mathematics, since the normal distribution $$ \frac 1 {\sqrt{2\pi}} e^{-x^2/2} ,dx $$ is at the heart of the matter. $\qquad$ – Michael Hardy May 12 '17 at 18:44

2 Answers2

4

Whether they are independent depends on the distribution of the errors. The context and general conventions make it clear that what is assumed something like this: $$ Y_i = \alpha + \beta_1 x_{1i} + \cdots + \beta_p x_{pi} + \varepsilon_i $$ where

  • $\varepsilon_i \sim \operatorname{i.i.d. N}(0,\sigma^2)$ for $i=1,\ldots,n$ and typically $n\gg p;$
  • $x_{ij}$ for $i=1,\ldots,p,$ $j=1,\ldots,n$ are constant (i.e. not random) and observable;
  • $\alpha,\beta_1,\ldots,\beta_p$ are constant and unobservable;
  • $Y_i$ for $i=1,\ldots,n$ are observable (and of course random since $\varepsilon_i$ are random).

One can write $$ Y = X\beta + \varepsilon $$ where

  • $Y\in\mathbb R^{n\times 1}$ (a long column vector);
  • $X\in\mathbb R^{n\times(p+1)}$ (a matrix with many rows and few columns);
  • $\beta\in\mathbb R^{(p+1)\times1}$ (a short column vector);
  • $\varepsilon\in\mathbb R^{n\times 1}$ (the same size and shape as $Y$, of course).

We will show that $\operatorname{SSE} = \|AY\|^2$ and $\operatorname{SSR} = \|BY\|^2$ where $A$ and $B$ are certain matrices with $n$ columns (and also $n$ rows, as we will see).

Central to the problem is this identity: $$ \operatorname{cov}(AY, BY) = A\Big( \operatorname{var}(Y) \Big) B^\top \tag{main identity} $$ and here $\operatorname{var}(Y)$ is an $n\times n$ nonnegative-definite matrix and $\operatorname{cov}(AY, BY)$ is a matrix with as many rows as $A$ and as many columns as $B^\top$ (thus also $n\times n$).

By the information in your quoted "Hint", we only need to show that this covariance is $0$ (i.e. the $n\times n$ zero matrix). It will follow that $AY$ and $BY$ are independent, and therefore functions of them are independent.

The vector of fitted values is the orthogonal projection of the vector $Y$ onto the column space of the matrix $X$. The vector of fitted values is therefore $$ \widehat Y = HY $$ where $H\in\mathbb R^{n\times n}$ is the "hat matrix" (so called because it transforms $Y$ to $\widehat Y$) $$ H = X\Big( X^\top X\Big)^{-1} X^\top = \underbrace{\quad X\quad}_{n\times(p+1)} \Big( \underbrace{\quad X^\top X \quad}_{(p+1)\times(p+1)} \Big)^{-1} \underbrace{\quad X^\top \quad}_{(p+1)\times n}. $$ To show that that is the orthogonal projection, it suffices to show two things: (1) If $Y$ is orthogonal to the column space, then $HY=0$. That's easy because in that case $X^\top Y=0.$ (2) If $Y$ is in the column space, then $HY=Y.$ That is shown by saying that in this case $Y=Xu$ for some $u\in\mathbb R^{(p+1)\times1}$ and then multiplying.

The vector of residuals is $$ \widehat\varepsilon = (I-H)Y $$ i.e. observed minus fitted equals residual. (This should not be confused with the unobservable vector $\varepsilon$ of true errors.)

Therefore $I-H$ will be in the role of the matrix $B$ in the main identity. And we have $$ \operatorname{SSE} = \|\widehat\varepsilon\|^2 = \|(I-H)Y\|^2. $$

The way $\operatorname{SSR}$ is usually defined is as $\sum_{i=1}^n (\widehat Y_i - \overline Y)^2,$ where $\overline Y = (Y_1+\cdots+Y_n)/n,$ the average $Y$ value. This is $$ \|(H-P)Y\|^2 $$ where $P$ is the $n\times n$ matrix whose every entry is $1/n.$

Thus $H-P$ will be in the role of $A$ in the main identity.

Now apply the main identity. (You will need to show that $(H-P)(I-H)=0.$ For that you need to notice that the columns of $P$ are in the column space of $H$.)

  • 1
    How can you prove (H-P) (I-H)=0? – Mahmudul Hasan Oct 29 '21 at 00:06
  • @MahmudulHasan To show $(H-P)(I-H)$, you will need two facts. First, $HH = H$ as $H$ is idempotent (it is a projection matrix). Second, $PH=HP=P$. For this second fact, the first equality is because $H$ and $P$ are both symmetric. The second equality is because each column of $P$ is just the column vector of all ones, denoted as $\textbf{1}$. This vector $\textbf{1}$ is in the column space of $H$, and hence $H$ acts as the identity on it. – eatfood May 13 '24 at 15:27
2

If $\mathbb{E}(Y\mid X)=X\beta$, then OLS estimate of $\beta$, is $\hat{\beta}=(X'X)^{-1}X'Y$. Then $$SSE=(Y-X\hat{\beta})'(Y-X\hat{\beta})=(Y-X(X'X)^{-1}X'Y)'(Y-X(X'X)^{-1}X'Y)\\=Y'(I-X(X'X)^{-1}X')'(I-X(X'X)^{-1}X')Y$$ Similarly $$SSR=Y'X(X'X)^{-1}X'Y$$ Now $X(X'X)^{-1}X'$ and $I-X(X'X)^{-1}X'$ are

  • idempotent,
  • respectively projection matrices of $\mathcal{C}(X)$ and $\mathcal{C}(X)^{\perp}$,
  • satisfies $X(X'X)^{-1}X'(I-X(X'X)^{-1}X')=0$ and
  • $X(X'X)^{-1}X'+(I-X(X'X)^{-1}X')=I$.

So by Fisher Cochran Theorem $SSE$ and $SSR$ are independent.

QED
  • 12,944
  • Your expression for $SSR$ is incorrect, so the argument is not so smooth. – StubbornAtom Apr 19 '22 at 20:57
  • Yeah, I think the correct expression for $SSR$ is $Y' (H-\frac{1}{n} J) Y$ where $J$ is the matrix with all entries $1$. I'm also not sure what the fourth point has to do with the rest of the argument, is it needed for the Fisher Cochran Theorem? – eatfood May 13 '24 at 15:36