1

I was studying regression models and found the formula:

$$V_Y^2 = V_\hat{Y}^2 + V_\varepsilon^2$$

  • $Y = a+bX+\varepsilon$, which is the dependent variable.

  • $\hat{Y}=a+bX$, which is the predictive variable (therefore $Y = \hat{Y} + \varepsilon$)

  • $\varepsilon$ is the prediction error

From this equation we can divide each term by $V_Y^2$ and get:

$$\frac{V_Y^2}{V_Y^2} = \frac{V_\hat{Y}^2}{V_Y^2} + \frac{V_\varepsilon^2}{V_Y^2}$$

$$1 = R^2 + \frac{V_\varepsilon^2}{V_Y^2}$$

$$R^2 = 1 - \frac{V_\varepsilon^2}{V_Y^2}$$

  • $R^2$ is the coefficient of determination

My first question is why we define:

$$R^2 = \frac{V_\hat{Y}^2}{V_Y^2}$$


Moreover, I found that the error variance is:

$$V_\varepsilon^2=\frac{\sum_{i=1}^{N}(y_i-\hat{y}_i)^2}{N}$$

This is because we assume $\bar{\varepsilon}=0$, where $\bar{\varepsilon}$ represents the average of the error $\varepsilon$, from the equation:

$$V_\varepsilon^2=\frac{\sum_{i=1}^{N}(\varepsilon_i-\bar{\varepsilon})^2}{N}$$

Furthermore, $\varepsilon_i=y_i-\hat{y}_i$, so we get the original equation.

My second question is why we make the assumption that $\bar{\varepsilon}=0$.

  • For your first question the answer follows algebraically. In your first equation for R^2 make a common denominator then use your first equation to substitute for V_Yhat.

    For your second question the residuals sum to 0 (under certain conditions). Maybe consult this thread https://math.stackexchange.com/questions/494181/why-the-sum-of-residuals-equals-0-when-we-do-a-sample-regression-by-ols. I say "under certain conditions" b/c if I recall correctly, the residuals do not have to sum to 0 if there is no intercept in the model

    – tarheeljks Feb 06 '25 at 23:15
  • Your original equation$ V_Y^2 = V_\hat{Y}^2 + V_\varepsilon^2$ is due to: (a) taking a conceptual model where the errors are assumed to be independent of the $X$s; and (b) the calculations in simple linear regression producing residuals uncorrelated with the predicted values. If you have data $(x_i, y_i)$ then simple linear regression $\frac{V_\hat{Y}^2}{V_Y^2}$ will equal the square of the correlation between these $x$ and $y$; you might write that as $\hat \rho_{x,y}^2$ and it is a small step to instead write this as $R^2$ both in the simple case and in other more complicated cases. – Henry Feb 06 '25 at 23:30

1 Answers1

1

Question 1: Why do we define $R^2$ in such way?

This might be due to the fact that as $V_Y^2$ contains the error $\varepsilon$ then we want to minimize this value so that $R^2$ is maximized. For $\varepsilon=0$ we get ${V_\hat{Y}^2} = {V_Y^2}$, therefore $R^2=1$.


Question 2: Why do we assume $\bar{\varepsilon}=0$?

First, we will assume lineal regression.

This might be due to the fact that we know:

$$\bar{\varepsilon}=\sum_{i=1}^{N}\frac{\varepsilon_i}{N}$$

We also know that:

$$RSS=\sum_{i=1}^{N}\varepsilon_i^2=\sum_{i=1}^{N}(y_i-\hat{y}_i)^2$$

where $RSS$ is the sum of squared residuals.

From $\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_1+\dots+\hat{\beta}_Nx_i$, we can rewrite it as:

$$RSS=\sum_{i=1}^{N}(y_i-(\hat{\beta}_0+\hat{\beta}_1x_1+\dots+\hat{\beta}_Nx_i))^2$$

We must take the partial derivatives of $RSS$ with respect to $\hat{\beta}_k, \forall k\in[1,N]$, where $k$ is an integer, and set them to zero to minimize the RSS.

For $\hat{\beta_0}$:

$$\frac{\partial RSS}{\partial \hat{\beta_0}}=0$$

Therefore:

$$\frac{\partial RSS}{\partial \hat{\beta_0}}=-2\sum_{i=1}^{N}(y_i-(\hat{\beta}_0+\hat{\beta}_1x_1+\dots+\hat{\beta}_Nx_i))=0$$

$$\sum_{i=1}^{N}(y_i-(\hat{\beta}_0+\hat{\beta}_1x_1+\dots+\hat{\beta}_Nx_i)=0$$

$$\sum_{i=1}^{N}\varepsilon_i=0$$

Generalizing for $k$:

$$\frac{\partial RSS}{\partial \hat{\beta_k}} = 0$$

$$\frac{\partial RSS}{\partial \hat{\beta_k}} = -2\sum_{i=1}^{N} x_k \cdot(y_i-(\hat{\beta}_0+\hat{\beta}_1x_1+\dots+\hat{\beta}_kx_i))=0$$

$$\sum_{i=1}^{N} x_k \cdot(y_i-(\hat{\beta}_0+\hat{\beta}_1x_1+\dots+\hat{\beta}_kx_i))=0$$

We will assume $x_k$ is always a constant.

$$\sum_{i=1}^{N} x_k \cdot \varepsilon_i=0$$

$$x_k \cdot \sum_{i=1}^{N} \varepsilon_i=0$$

$$\sum_{i=1}^{N}\varepsilon_i=0$$

We obtain mathematically that the residuals must sum to zero, so:

$$\bar{\varepsilon}=\sum_{i=1}^{N}\frac{\varepsilon_i}{N}=\frac{1}{N}\sum_{i=1}^{N}\varepsilon_i=\frac{1}{N} \cdot 0 = 0$$