Proving that SSE and SSR are independent

Question

I'm trying to show that SSE and SSR are independent (conditionally on X) but I have to use the following steps/hint. [Hint: Notice you have to consider SSE and SSR as random variables, so be careful how you define them. You may want to use the result that two linear forms U = AX and V = BX, with A and B being constant matrices and X is Normal, are independent iff Cov(U, V) = 0].

I know that the question was posted before but I'm not finding how to prove it using this hint.

Any help is much appreciated!

This isn't actually discrete mathematics, since the normal distribution $$ \frac 1 {\sqrt{2\pi}} e^{-x^2/2} ,dx $$ is at the heart of the matter. $\qquad$ — Michael Hardy, May 12 '17 at 18:44

Michael Hardy · Answer 1 · 2017-05-12T20:54:08.473

Whether they are independent depends on the distribution of the errors. The context and general conventions make it clear that what is assumed something like this: $$ Y_i = \alpha + \beta_1 x_{1i} + \cdots + \beta_p x_{pi} + \varepsilon_i $$ where

$\varepsilon_i \sim \operatorname{i.i.d. N}(0,\sigma^2)$ for $i=1,\ldots,n$ and typically $n\gg p;$
$x_{ij}$ for $i=1,\ldots,p,$ $j=1,\ldots,n$ are constant (i.e. not random) and observable;
$\alpha,\beta_1,\ldots,\beta_p$ are constant and unobservable;
$Y_i$ for $i=1,\ldots,n$ are observable (and of course random since $\varepsilon_i$ are random).

One can write $$ Y = X\beta + \varepsilon $$ where

$Y\in\mathbb R^{n\times 1}$ (a long column vector);
$X\in\mathbb R^{n\times(p+1)}$ (a matrix with many rows and few columns);
$\beta\in\mathbb R^{(p+1)\times1}$ (a short column vector);
$\varepsilon\in\mathbb R^{n\times 1}$ (the same size and shape as $Y$, of course).

We will show that $\operatorname{SSE} = \|AY\|^2$ and $\operatorname{SSR} = \|BY\|^2$ where $A$ and $B$ are certain matrices with $n$ columns (and also $n$ rows, as we will see).

Central to the problem is this identity: $$ \operatorname{cov}(AY, BY) = A\Big( \operatorname{var}(Y) \Big) B^\top \tag{main identity} $$ and here $\operatorname{var}(Y)$ is an $n\times n$ nonnegative-definite matrix and $\operatorname{cov}(AY, BY)$ is a matrix with as many rows as $A$ and as many columns as $B^\top$ (thus also $n\times n$).

By the information in your quoted "Hint", we only need to show that this covariance is $0$ (i.e. the $n\times n$ zero matrix). It will follow that $AY$ and $BY$ are independent, and therefore functions of them are independent.

The vector of fitted values is the orthogonal projection of the vector $Y$ onto the column space of the matrix $X$. The vector of fitted values is therefore $$ \widehat Y = HY $$ where $H\in\mathbb R^{n\times n}$ is the "hat matrix" (so called because it transforms $Y$ to $\widehat Y$) $$ H = X\Big( X^\top X\Big)^{-1} X^\top = \underbrace{\quad X\quad}_{n\times(p+1)} \Big( \underbrace{\quad X^\top X \quad}_{(p+1)\times(p+1)} \Big)^{-1} \underbrace{\quad X^\top \quad}_{(p+1)\times n}. $$ To show that that is the orthogonal projection, it suffices to show two things: (1) If $Y$ is orthogonal to the column space, then $HY=0$. That's easy because in that case $X^\top Y=0.$ (2) If $Y$ is in the column space, then $HY=Y.$ That is shown by saying that in this case $Y=Xu$ for some $u\in\mathbb R^{(p+1)\times1}$ and then multiplying.

The vector of residuals is $$ \widehat\varepsilon = (I-H)Y $$ i.e. observed minus fitted equals residual. (This should not be confused with the unobservable vector $\varepsilon$ of true errors.)

Therefore $I-H$ will be in the role of the matrix $B$ in the main identity. And we have $$ \operatorname{SSE} = \|\widehat\varepsilon\|^2 = \|(I-H)Y\|^2. $$

The way $\operatorname{SSR}$ is usually defined is as $\sum_{i=1}^n (\widehat Y_i - \overline Y)^2,$ where $\overline Y = (Y_1+\cdots+Y_n)/n,$ the average $Y$ value. This is $$ \|(H-P)Y\|^2 $$ where $P$ is the $n\times n$ matrix whose every entry is $1/n.$

Thus $H-P$ will be in the role of $A$ in the main identity.

Now apply the main identity. (You will need to show that $(H-P)(I-H)=0.$ For that you need to notice that the columns of $P$ are in the column space of $H$.)

@MahmudulHasan To show $(H-P)(I-H)$, you will need two facts. First, $HH = H$ as $H$ is idempotent (it is a projection matrix). Second, $PH=HP=P$. For this second fact, the first equality is because $H$ and $P$ are both symmetric. The second equality is because each column of $P$ is just the column vector of all ones, denoted as $\textbf{1}$. This vector $\textbf{1}$ is in the column space of $H$, and hence $H$ acts as the identity on it. — eatfood, May 13 '24 at 15:27

score 2 · Answer 2 · edited May 12 '17 at 18:43

2

If $\mathbb{E}(Y\mid X)=X\beta$, then OLS estimate of $\beta$, is $\hat{\beta}=(X'X)^{-1}X'Y$. Then $$SSE=(Y-X\hat{\beta})'(Y-X\hat{\beta})=(Y-X(X'X)^{-1}X'Y)'(Y-X(X'X)^{-1}X'Y)\\=Y'(I-X(X'X)^{-1}X')'(I-X(X'X)^{-1}X')Y$$ Similarly $$SSR=Y'X(X'X)^{-1}X'Y$$ Now $X(X'X)^{-1}X'$ and $I-X(X'X)^{-1}X'$ are

idempotent,
respectively projection matrices of $\mathcal{C}(X)$ and $\mathcal{C}(X)^{\perp}$,
satisfies $X(X'X)^{-1}X'(I-X(X'X)^{-1}X')=0$ and
$X(X'X)^{-1}X'+(I-X(X'X)^{-1}X')=I$.

So by Fisher Cochran Theorem $SSE$ and $SSR$ are independent.

edited May 12 '17 at 18:43

Michael Hardy

1

answered May 12 '17 at 17:58

QED

12,944

Your expression for $SSR$ is incorrect, so the argument is not so smooth. – StubbornAtom Apr 19 '22 at 20:57
Yeah, I think the correct expression for $SSR$ is $Y' (H-\frac{1}{n} J) Y$ where $J$ is the matrix with all entries $1$. I'm also not sure what the fourth point has to do with the rest of the argument, is it needed for the Fisher Cochran Theorem? – eatfood May 13 '24 at 15:36

Proving that SSE and SSR are independent

2 Answers2

Linked