I would like to ask for the interpretation, both mathematically and intuitively if possible, about the homoscedasticity of the variance of errors in linear regression models.
If there is correlation among the error terms, then how it would affect the estimated standard errors of regression coefficients $\beta_i's$, the confidence and prediction intervals (if we were to keep the assumption of homoscedasticity of errors and run the linear regression models) and how is it compared to the true standard errors $Var(\epsilon)$ (like underestimate or overestimate the true standard errors) and why?
My question arises from the section about "Correlation of Error Terms" in the book "Introduction to Statistical Learning". It is as follows:
An important assumption of the linear regression model is that the error terms, $\epsilon_1, \epsilon_2, ..., \epsilon_n$, are uncorrelated. What does this mean? For instance, if the errors are uncorrelated, then the fact that $\epsilon_i$ is positive provides little or no information about the sign of $\epsilon_{i+1}$. The standard errors that are computed for the estimated regression coefficients or the fitted values are based on the assumption of uncorrelated error terms. If in fact there is correlation among the error terms, then the estimated standard errors will tend to underestimate the true standard errors. As a result, confidence and prediction intervals will be narrower than they should be. For example, a 95 % confidence interval may in reality have a much lower probability than 0.95 of containing the true value of the parameter. In addition, p-values associated with the model will be lower than they should be; this could cause us to erroneously conclude that a parameter is statistically significant. In short, if the error terms are correlated, we may have an unwarranted sense of confidence in our model. As an extreme example, suppose we accidentally doubled our data, leading to observations and error terms identical in pairs. If we ignored this, our standard error calculations would be as if we had a sample of size $2n$, when in fact we have only n samples. Our estimated parameters would be the same for the $2n$ samples as for the $n$ samples, but the confidence intervals would be narrower by a factor of $\sqrt2$!
I hope my question is clear. Many thanks in advance for sharing your insights on the question!