4

I understand the maximum likelihood and least square methods individually for parameter estimation. It appears maximum likelihood is very general and least square solution is applicable for a class of problems but I am not sure. Can anyone point out under what condition both model will give the same estimate of a parameter?

Creator
  • 3,198

1 Answers1

4

So the normal equations arise as maximum likelihood estimate if you assume a multivariate normal model with independent noise. To make this explicit, we assume that each observation follows

$$ y = x^T\beta + \epsilon $$

where $\epsilon \sim N (0,v), v > 0.$ Hence given a set of i.i.d. observations ${(x_1, y_1), ..., (x_n, y_n)},$ we can write this as $Y \sim \text{Normal} (X\beta, \epsilon I)$, where $Y =[y_1, ..., y_n]$ and X is the matrix with $i^\text{th}$ row vector $(1, x_i)$, $\beta$ a vector of $ p+1$ coefficients, and $ I $ the identity matrix. Hence

$$ l(\beta) := p (Y|X, \beta) \propto \exp \left(-\frac{1}{2v}||Y - X\beta||^2\right). $$

Since log is strictly increasing, this is equivalent to maximizing

$$ \text {log} \ l(\beta) = \log C - \frac{1}{2}||Y - X\beta||^2 $$

where $C$ is a constant. Hence, when viewing this as a function of $\beta$, it suffices to minimize the norm:

$$ ||Y - X\beta||^2. $$

Least squares gives the estimate solution to be

$$ \hat{\beta} = (X^TX)^{-1}X^TY, $$

which are the normal equations.