3

I am faced to a problem of demonstration, about Maximum Likelihood Estimation, summarized on this image :

Issue for demonstration

Indeed, I don't know how to prove the following equality between :

(1)

$$\begin{aligned} \operatorname{var}(\hat{\theta}) &=E\left[(\hat{\theta}-\theta)(\hat{\theta}-\theta)^{\prime}\right] \\ &=E\left[\left[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\right]^{-1} \frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}\left[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\right]^{-1}\right] \end{aligned}$$

(2)

$$\begin{aligned} \operatorname{var}(\hat{\theta}) &=E\left[\left[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\right]^{-1} \frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\left[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\right]^{-1}\right] \\ &=\left(-E\left[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\right]\right)^{-1} \end{aligned}$$

Equality between (1) and (2) suppose that :

$$\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}=\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}$$

This is the equality I would like to prove.

1) Is there an approximation between both ? not just an equality ?

It is said that "If the model is correctly specified, then the expectation of the outer product of the scores (the middle bit) is equal to the information matrix" :

2) What does "if the model is correctly specified" mean ?

Maybe a Taylor development could help me to prove this equality but for now, I can't manage to prove it...

UPDATE 1 : Thanks for @Max, the demonstration is not very difficult. But just a last request : if I use the $\log$ of Likelihood $\mathcal{L}$ by taking $\mathcal{L} = \log\bigg(\Pi_{i}\,f(x_{i})\bigg)$ with $x_{i}$ all experimental/observed values , I have difficulties to find the same relation.

We have : $\dfrac{\partial \mathcal{L}}{\partial \theta_{i}} = \dfrac{\partial \log\big(\Pi_{k}\,f(x_{k})\big)}{\partial \theta_{i}} = \dfrac{\big(\partial \sum_{k}\,\log\,f(x_{k})\big)}{\partial \theta_{i}} =\sum_{k}\,\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}}$

Now I have to compute : $\dfrac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\dfrac{\partial}{\partial \theta_j} \left(\sum_{k}\,\dfrac{1}{f(x_{k})}\,\dfrac{\partial f(x_{k})}{\partial \theta_{i}} \right)$ $= -\sum_{k} \big(\dfrac{1}{f(x_{k})^2} \dfrac{\partial f(x_{k})}{\partial \theta_{j}}\dfrac{\partial f(x_{k})}{\partial \theta_{i}}+\dfrac{1}{f(x_{k})}\,\dfrac{\partial^{2} f(x_{k})}{ \partial \theta_i \partial \theta_j}\big)$ $=-\sum_{k}\big(\dfrac{\partial \log(f(x_{k}))}{\partial \theta_{i}}\, \dfrac{\partial \log(f(x_{k}))}{\partial \theta_{j}}+ \dfrac{1}{f(x_{k})} \dfrac{\partial^{2} f(x_{k})}{\partial \theta_{i} \partial \theta_{j}}\big)$

So, with second term which can be zero under regularity conditions, we get :

$-\sum_{k}\big(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\, \dfrac{\partial \log(f(x_{k})}{\partial \theta_{j}}\big)\quad\quad(1)$

But I don't know how to conclude since I can't make appear the product of the 2 derivatives of $\mathcal{L}$, i.e I would like to find from $(1)$ the product :

UPDATE 2: I realized that I may separate the $\sum_{k}$ and $\sum_{l}$ and do the same between $\partial$ and $\sum$ , so I could write :

$$\dfrac{\partial \log\big(\Pi_{k} f(x_{k})\big)}{\partial \theta_{i}}\,\dfrac{\partial \log\big(\Pi_{k}f(x_{k})\big)}{\partial \theta_{j}}=\sum_{k}\sum_{l}\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\,\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg) =\sum_{k}\bigg(\dfrac{\partial \log(f(x_{k})}{\partial \theta_{i}}\bigg)\sum_{l}\bigg(\dfrac{\partial \log(f(x_{l})}{\partial \theta_{j}}\bigg) =\bigg(\dfrac{\partial \log(\Pi_{k}f(x_{k})}{\partial \theta_{i}}\bigg)\bigg(\dfrac{\partial \log(\Pi_{l}f(x_{l})}{\partial \theta_{j}}\bigg) =\dfrac{\partial \mathcal{L}}{\partial \theta_i} \dfrac{\partial \mathcal{L}}{\partial \theta_j}$$

Is this demonstration correct, I mean this separation and permutation ?

Regards

  • 1
    What does $\cal L'$ mean (with the dash)? – TheSimpliFire Nov 24 '19 at 10:18
  • @TheSimpleFire : $\cal L'$ correponds to the Likelihood of parameter $\theta'$ and $\cal L$ to the Likelihood of $\theta$. You can see on $eq(34)$ at the beginning. –  Nov 24 '19 at 14:07
  • 1
    But surely $\cal L=L'$. Should it not be $\frac{\partial\cal L}{\partial \theta}\frac{\partial\cal L}{\partial\theta'}$? And how are $\theta$ and $\theta'$ related? – TheSimpliFire Nov 24 '19 at 14:09
  • @TheSimpliFire, not exactly, I have to demonstrate that : $\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}=\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}$ –  Nov 24 '19 at 14:10
  • 1
    If $\theta,\theta'$ come from the same p.d.f. then the likelihoods $\cal L$ and $\cal L'$ are the same irrespective of the parameter. E.g. for a normal distribution the parameters are $\mu$ and $\sigma^2$ but the likelihood function is still $\mathcal L(\mu,\sigma^2\mid x)=\frac1{(2\pi\sigma^2)^{n/2}}\exp\left(-\frac1{2\sigma^2}\sum\limits_{i=1}^n(x_i-\mu)^2\right)$. – TheSimpliFire Nov 24 '19 at 14:14
  • But why they are using this development : $\begin{aligned} 0 &=\frac{\partial \mathcal{L}}{\partial \theta} | \hat{\theta} \ &=\frac{\partial \mathcal{L}}{\partial \theta} | \theta+\frac{\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}(\hat{\theta}-\theta) \ \hat{\theta}-\theta &=-\left[\frac{\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}\right]^{-1} \frac{\partial \mathcal{L}}{\partial \theta} \end{aligned}$ ? Is it not correct ? –  Nov 24 '19 at 14:22
  • 1
    The notes form which this is taken do not seem very well written (there are multiple issues, not just what you have quoted). You should probably look for a better text. – Max Nov 26 '19 at 12:12
  • @Max . I didn't think that following equality : $\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}=\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}$ would be so hard to demonstrate. Could you give please other references where there aren't multiples issues compared to my source at the beginning of my post ? Regards –  Nov 26 '19 at 16:22

1 Answers1

2

The equation you are after is not $\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}=\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}$, but rather $$E[\frac{\partial \mathcal{L}}{\partial \theta} \frac{\partial \mathcal{L}^{\prime}}{\partial \theta}]=E[\frac{-\partial^{2} \mathcal{L}}{\partial \theta \partial \theta^{\prime}}].$$

In more usual notation

$$E[\frac{\partial \mathcal{L}}{\partial \theta_i} \frac{\partial \mathcal{L}}{\partial \theta_j}]=E[\frac{-\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}].$$

Now, by definition $\mathcal{L}=\log p$, so by chain rule $\frac{\partial \mathcal{L}}{\partial \theta_i} =\frac{1}{p} \frac{\partial p}{\partial \theta_i} $, and differentiating again

$$\frac{\partial^{2} \mathcal{L}}{\partial \theta_i \partial \theta_j}=\frac{\partial}{\partial \theta_j} \left(\frac{1}{p} \frac{\partial p}{\partial \theta_i} \right)=-\frac{1}{p^2} \frac{\partial p}{\partial \theta_j}\frac{\partial p}{\partial \theta_i}+\frac{1}{p} \frac{\partial^{2} p}{\partial \theta_i \partial \theta_j}=-\frac{\partial \mathcal{L}}{\partial \theta_i} \frac{\partial \mathcal{L}}{\partial \theta_j} + \frac{1}{p} \frac{\partial^{2} p}{\partial \theta_i \partial \theta_j}.$$

Now we simply take expectation of both sides, which means multiplying by $p$ and integrating; we almost get what we want, except for the extra term $\int \frac{1}{p} \frac{\partial^{2} p}{\partial \theta_i \partial \theta_j} p dX=\int \frac{\partial^{2} p}{\partial \theta_i \partial \theta_j}dX $. However, $\int p dX=1$ independently of $\theta$, so under regularity conditions allowing passing differentiation with respect to the parameter into the integral $\int \frac{\partial p}{\partial \theta_i }dX=0 $ and $\int \frac{\partial^{2} p}{\partial \theta_i \partial \theta_j}dX =0$, so the extra term vanishes, and we get what we want.

More or less all of this is written in https://en.wikipedia.org/wiki/Fisher_information#Definition

It is my current understanding that many of the other statements in the notes you link to are incorrect. In particular, the variance of MLE estimate is not in general given by the inverse of the Fisher information matrix.

Max
  • 14,503
  • Thanks a lot ! I will try to demonstrate your last part (extra term which vanishes). Regards –  Nov 28 '19 at 13:36
  • Hi, could you take a look please at my UPDATE2, I would like to get your advice about the general formula of Likelihood and the associated relation that you initially proved. Thanks ! –  Jan 26 '20 at 11:50
  • by p you mean the pdf. Great work! – Maverick Meerkat Apr 01 '20 at 15:41