3

The proof for omitted variable bias is pretty simple, via here:

Assume the true model is $$Y = X\beta + Z\delta + U$$

and we estimate naively $$Y = X\beta + W$$

The OLS estimate of the incorrectly specified model is then $$\beta = [X^TX]^{-1}X^TY = [X^TX]^{-1}X^T[X\beta + Z\delta + U] = \beta + [X^TX]^{-1}X^T[Z\delta + U]$$

And taking the expectation conditional on $X$ (assuming $U$ is mean zero and independent of $X$): $$\mathbb{E}[\beta\vert X] = \beta + [X^TX]^{-1}\mathbb{E}[X^TZ\vert X]\delta$$

The standard proof then shows that our estimate of $\beta$ will be biased if $X$ and $Z$ are correlated.

But what if the mean of $X$ and $Z$ are both nonzero? For example, assume $Z$ is independent of $X$ and that $\mathbb{E}X\neq 0 \neq \mathbb{E}Z$.

Then we have $$\mathbb{E}[\beta\vert X] = \beta + [X^TX]^{-1}\mathbb{E}[X^TZ\vert X]\delta = \beta + [X^TX]^{-1}X^T\mathbb{E}[Z]\delta \neq \beta $$

Doesn't that mean, then, that our estimate of $\beta$ can be biased even if the omitted variable is independent of the other regressors if $\mathbb{E}Z>0$

1 Answers1

2

It is true, as long as you don't include the intercept in the regression (a column of 1's in the $X$ matrix).

If you do include an intercept, then

$Y=X\beta+\delta Z+U$

Let's assume that the first column of $X$ is a column of 1's then

$Y=1_n\beta_1+X_{-1}\beta_{-1}+\delta Z+U$

Where $X_{-1}$ represents the $X$ matrix without the first column (define $\beta_{-1}$ analogously)

It can be proven that this can be rewritten as

$Y=1_n\bar Y+X_{-1}^M\beta_{-1}+\delta Z^M+U$

Where $Z^M=Z-\bar Z$ and $X^M$ is defined analogously (by centering each variable).

Now, it follows that the estimator of $\beta_{-1}$ is simply $(X_{-1}^{'M}X_{-1}^M)^{-1}X_{-1}^{'M}Y$ because each column of $X_{-1}^M$ is orthogonal to $1_n$.

Now, because of the orthogonality it also follows that $\hat\beta_{-1}$ is unbiased since

$E((X_{-1}^{'M}X_{-1}^M)^{-1}X_{-1}^{'M}Y)=E((X_{-1}^{'M}X_{-1}^M)^{-1}X_{-1}^{'M}(1_n\bar Y+X_{-1}^M\beta_{-1}+\delta Z^M+U))=\hat\beta_{-1}$

  • But that still leaves you with $\mathbb{E}[\beta\vert X] = \beta + [X^TX]^{-1}X^T\mathbb{E}[(Z^* + \mu_z)]\delta = \beta + [X^TX]^{-1}X^T\mathbb{E}[Z]\delta \neq \beta $ I don't see how that's different. – measure_theory May 10 '18 at 19:20
  • Because you will get $E[Z^*]$ not $Z$. The $\delta\mu_z$ gets absorbed by the $\beta_i$ that corresponds to the intercept. So $\beta$ is unbiased except for $\beta_i$ because $\hat\beta_i$ now estimates $\beta_i+\delta\mu_z$. – Daniel Ordoñez May 10 '18 at 19:23
  • Why would it only impact the intercept? If you assume $\delta = [1, 1, ..., 1]$ for example the bias at each element of $\beta$ will be $\mu_x$ times the contribution of $[X^TX]^{-1}X^T$. – measure_theory May 10 '18 at 19:30
  • I guess I'm just confused how this matches up with $\hat{\beta} = \beta + [X^TX]^{-1}X^T\mu\delta$ as I had it, because that must mean that $[X^TX]^{-1}X^T\mu_x\delta = [bias, 0]^T$ assuming $\beta = [\beta_1, \beta_{-1}]^T$. But that doesn't really seem right to me... – measure_theory May 10 '18 at 20:05
  • Yes, the mean of $Z$ has an effect on $\beta_1$ but not on the other coefficients. – Daniel Ordoñez May 10 '18 at 20:08
  • Right, but that must mean that $[X^TX]^{-1}X^T = [A,0]^T$ assuming $\mu_x$ and $\delta$ are constants (for some A). How can that be possible? – measure_theory May 10 '18 at 20:10
  • I'm not disagreeing with what you write, I'm just confused how what I write and what you write are equivalent. – measure_theory May 10 '18 at 20:11
  • It means that $(X^TX)^{-1}X^T1_n=(A,0)^T$ and this is true as long as $X$ has a column of 1's. But might be very hard to see directly.... – Daniel Ordoñez May 10 '18 at 20:22
  • That makes sense, thanks. – measure_theory May 10 '18 at 20:29