I was studying regression models and found the formula:
$$V_Y^2 = V_\hat{Y}^2 + V_\varepsilon^2$$
$Y = a+bX+\varepsilon$, which is the dependent variable.
$\hat{Y}=a+bX$, which is the predictive variable (therefore $Y = \hat{Y} + \varepsilon$)
$\varepsilon$ is the prediction error
From this equation we can divide each term by $V_Y^2$ and get:
$$\frac{V_Y^2}{V_Y^2} = \frac{V_\hat{Y}^2}{V_Y^2} + \frac{V_\varepsilon^2}{V_Y^2}$$
$$1 = R^2 + \frac{V_\varepsilon^2}{V_Y^2}$$
$$R^2 = 1 - \frac{V_\varepsilon^2}{V_Y^2}$$
- $R^2$ is the coefficient of determination
My first question is why we define:
$$R^2 = \frac{V_\hat{Y}^2}{V_Y^2}$$
Moreover, I found that the error variance is:
$$V_\varepsilon^2=\frac{\sum_{i=1}^{N}(y_i-\hat{y}_i)^2}{N}$$
This is because we assume $\bar{\varepsilon}=0$, where $\bar{\varepsilon}$ represents the average of the error $\varepsilon$, from the equation:
$$V_\varepsilon^2=\frac{\sum_{i=1}^{N}(\varepsilon_i-\bar{\varepsilon})^2}{N}$$
Furthermore, $\varepsilon_i=y_i-\hat{y}_i$, so we get the original equation.
My second question is why we make the assumption that $\bar{\varepsilon}=0$.
For your second question the residuals sum to 0 (under certain conditions). Maybe consult this thread https://math.stackexchange.com/questions/494181/why-the-sum-of-residuals-equals-0-when-we-do-a-sample-regression-by-ols. I say "under certain conditions" b/c if I recall correctly, the residuals do not have to sum to 0 if there is no intercept in the model
– tarheeljks Feb 06 '25 at 23:15