1

I'm new to linear regression so I don't really understand this question

Consider the general multiple-regression equation $$Y=\beta_0 +\beta_1 x_1 +\beta_2 x_2 + \dots + \beta_k x_k +\epsilon$$ An alternative procedure for calculating the least-quares coefficient $\hat{\beta}_1$ is as follow:

  1. Regress $Y$ on $X_2$ through $X_k$, obtaining $E_{Y|2\dots k}$.
  2. Regress $X_1$ $X_2$ through $X_k$, obtaining $E_{1|2\dots k}$.
  3. Regress the residuals $E_{Y|2\dots k}$ on the residuals $E_{1|2\dots k}$. The slope for this simple regression is the multiple - regression slope for $X_1$, which is $\hat{\beta}_1$.

So for 1), I think it as $E(y|x_2,x_3,\dots,x_k)=\beta_0+\beta_1E(x_1)+\beta_2x_2+\dots+\beta_kx_k$ right? However, I'm not sure how to do for number 2 and 3. Can somebody please give me any idea?

hxllearnmath
  • 410
  • 2
  • 10

1 Answers1

2

First, a review of multiple regression methodology. When you perform a regression of a response variable $Y$ on predictor variables $X_0, X_1, X_2,\ldots, X_k$, the procedure is to construct a design matrix $X$ that holds the predictors as columns. The multiple regression model then becomes $Y=X\beta+\epsilon$ where $\beta$ and $\epsilon$ are column vectors of appropriate size. The least squares estimators $\hat\beta$ for the unknown vector $\beta$ will then satisfy the normal equations $(X^TX)\hat\beta=X^TY$, whence $$\hat\beta=(X^TX)^{-1}X^TY.\tag{$\dagger$}$$ The residual vector $E$ is defined as the difference between the observed $Y$ and the predicted $\hat Y:=X\hat\beta$, so that $$E := Y-X\hat\beta=Y-X(X^TX)^{-1}X^TY.$$ It is convenient to introduce the hat matrix $H:=X(X^TX)^{-1}X^T$ so we can write the residual vector in the tidier form $$E=(I-H)Y.\tag{$\ast$}$$


Now apply this to your situation. In parts (1) and (2) you are being asked to consider $X_2$ through $X_k$ (and an intercept term) as predictors in a model. For part (1) the model is $$E(y\mid x_2,\ldots,x_k)=\alpha_0+\alpha_2x_2 +\cdots+\alpha_kx_k,$$ whereas for (2) it's $$E(x_1\mid x_2,\ldots,x_k)=\gamma_0+\gamma_2x_2 +\cdots+\gamma_kx_k,$$ for unknown parameters $\alpha_0,\ldots,\alpha_k$ and $\gamma_0,\ldots,\gamma_k$ to be estimated. So your design matrix is the same for parts (1) and (2): You pack a column vector of ones, and the regressor columns $X_2$ through $X_k$ into your design matrix $X_{2..k}$. In part (1) the response variable is $Y$, whereas in (2) the response variable is $X_1$. Construct the hat matrix $H_{2..k}$. The residuals for these two regressions are then given by formula ($\ast$): $$E_{Y\mid 2..k}=(I-H_{2..k})Y,\qquad E_{X_1\mid 2..k}=(I-H_{2..k})X_1.$$

In part (3) you perform a simple linear regression of column vector $E_{Y\mid 2..k}$ (treated as a response) on column vector $E_{X_1\mid 2..k}$ (treated as a predictor). Apply the usual formula to obtain the slope of the regression line. Alternatively, you can use the matrix formulation to set up a design matrix (with two columns) and compute the least squares estimator of the slope using $(\dagger)$.

Notice that in this context the column vectors you are dealing with both sum to zero (because the residuals sum to zero whenever there is an intercept term in the model); this gives an especially concise form for the slope of the simple linear regression line.

grand_chat
  • 40,909