Linear regression: degrees of freedom of SST, SSR, and RSS

Question

I'm trying to understand the concept of degrees of freedom in the specific case of the three quantities involved in a linear regression solution,

i.e. $SST=SSR+SSE, $

i.e. Total sum of squares = sum of squares due to regression + sum of squared errors,

i.e. $\sum(y_i-\bar y)^2=\sum(\hat y_i-\bar y)^2+\sum(y_i-\hat y_i)^2$.

I tried Wikipedia and thought I had understood why the first (SST) and the third (SSE) have (n-1) and (n-2) degrees of freedom respectively, but I could not make out why (SSR) has 1 degree of freedom. So maybe I did not understand degrees of freedom after all. Can someone explain?

Thank you!

Sources: http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29 http://www.cs.rice.edu/~johnmc/comp528/lecture-notes/Lecture9.pdf

Because SSR is the sum of the squares of the expected response $\hat y_i$ minus the mean response $\bar y$. The expected response is calculated from the linear regression model fit. When you subtract the mean response, the intercept parameter drops out, leaving only the slope parameter as the single degree of freedom. — heropup, Jan 04 '14 at 08:26
So does the following make sense:

SST is the sum of the squares of the individual responses $y_i$ minus the mean response $\bar y$. When we subtract the mean response and subject it to the constraint that $\sum (y_i-\bar y)=0$, then it leaves us with n-1 degrees of freedom for the $y_i$ values for us to determine the value of $SST$ exactly. — Jarris, Jan 04 '14 at 20:32
And does this make sense: 2. SSE is the sum of the squares of the individual response $y_i$ minus the expected response $\hat y_i$, where the expected response is calculated from the linear regression model fit. So when we subtract the expected response, and subject it to the constraint that the first partials =0 (which give two equations), that leaves n-2 degrees of freedom for the values of $y_i$ in order to determine SSE exactly. — Jarris, Jan 04 '14 at 20:32
Yes, those are valid interpretations of the degrees of freedom. — heropup, Jan 04 '14 at 20:39
Thanks! Would you mind posting your answer as the answer so I can accept it? — Jarris, Jan 04 '14 at 20:44

score 14 · Answer 1 · edited Oct 30 '17 at 04:43

There are many different ways to look at degrees of freedom. I wanted to provide a rigorous answer that starts from a concrete definition of degrees of freedom for a statistical estimator as this may be useful/satisfying to some readers:

Definition: Given an observational model of the form $$y_i=r(x_i)+\xi_i,\ \ \ i=1,\dots,n$$ where $\xi_i=\mathcal{N}(0,\sigma^2)$ are i.i.d. noise terms and the $x_i$ are fixed. The degrees of freedom (DOF) of the estimator $\hat{y}$ is defined as $$\text{df}(\hat{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\hat{y}_i,y_i)=\frac{1}{\sigma^2}\text{Tr}(\text{Cov}(\hat{y},y)),$$ or equivalently by Stein's lemma $$\text{df}(\hat{y})=\mathbb{E}(\text{div} \hat{y}).$$

Using this definition, let's analyze linear regression.

Linear Regression: Consider the model $$y_i=x_i\beta +\xi_i,$$ with $x_i\in\mathbb{R}^p$ are independent row vectors. In your case, $p=2$, and the $x_i={z_i,1}$ correspond to a point and the constant $1$, and $\beta=\left[\begin{array}{c} m\\ b \end{array}\right]$, that is a slope and constant term so that $x_i \beta=m z_i+b$. Then this can be rewritten as $$y=X\beta+\xi$$ where $X$ is an $n\times p$ matrix whose $i^{th}$ row is $x_i$. The least squares estimator is $\hat{\beta}^{LS}=(X^T X)^{-1}X^Ty$. Let's now based on the above definition calculate the degrees of freedom of $SST$, $SSR$, and $SSE$.

$SST:$ For this, we need to calculate $$\text{df}(y_i-\overline{y})=\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(y_i-\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n\text{Cov}(\overline{y},y_i)=n-\frac{1}{\sigma^2}\sum_{i=1}^n \frac{\sigma^2}{n}=n-1.$$

$SSR:$ For this, we need to calculate $$\text{df}(X\hat{\beta}^{LS}-\overline{y})=\frac{1}{\sigma^2}\text{Tr}\left(\text{Cov}(X(X^TX)^{-1}X^y,y\right)-\text{df}(\overline{y})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X\text{Cov(y,y)})$$ $$=-1+\text{Tr}(X(X^TX)^{-1}X^T)$$ $$=p-1.$$ In your case $p=2$ since you will want $X$ to include the all ones vector so that there is an intercept term, and so the degrees of freedom will be $1$. However note that this will equal the number of parameters when we are doing regression with multiple parameters.

$SSE:$ $(n-1)-(p-1)=n-p$, which follows linearity of $df$.

A very nice answer (+1)! Could you please explain what $\operatorname{div}$ means in the equivalent definition of the degrees of freedom? — Cm7F7Bb, May 29 '20 at 17:27
@Cm7F7Bb it means divergence. If you read this post, you will get to know how the degrees of freedom of $\hat{Y}$ is represented as a partial difference of all $y_i$'s . My question is that how is Tr.($X(X^T X)^{-1}X^T$) = p ? I get that the link that I have posted also explains it by expressing X as a pure diagonal matrix, but how do we go from S.V.D. decomposition of X to that? — Akshay Prabhakant, Jun 15 '21 at 04:22

score 3 · Answer 2 · answered Nov 18 '15 at 10:48

Since $\hat{y_i}$ is determined from the linear regression, it has two degrees of freedom, corresponding to the fact that we specify a line by two points. When we consider the equation of a line in slope-intercept form, this becomes the slope value and the y-intercept value. When we subtract the mean response, $\overline{y}$, it cancels the y-intercept value (a property of the construction of the regression), and so the only degree of freedom we are left with is the one due to the slope. Thus the number of degrees of freedom is $1$.

Linear regression: degrees of freedom of SST, SSR, and RSS

2 Answers2