Convexity of a Log Likelihood Function

Question

Goal

I would like to proof than the Negative Log Likelihood Function of Sample drawn from a Normal Distribution is convex.

Below a Figure showing an example of such function:

Motivation of this question is detailed at the end of the post.

Sketching the demonstration

What I did so far...

First I have written the likelihood function for a single observation:

$$\mathcal{L} (\mu, \sigma \mid x) = \frac{1}{\sqrt{2\pi\sigma^2} } e^{ -\frac{(x-\mu)^2}{2\sigma^2} }$$

For convenience I take the Negative Logarithm of the Likelihood function, I think it does not change the extremum because Logarithm is a monotonic function:

$$f(\mu, \sigma \mid x) = -\ln \mathcal{L} (\mu, \sigma \mid x) = \frac{1}{2}\ln(2\pi) + \ln(\sigma) +\frac{(x-\mu)^2}{2\sigma^2}$$

Then I can assess the Jacobian (check: 1 and 2):

$$ \mathbf{J}_f = \left[ \begin{matrix} -\frac{x-\mu}{\sigma^2}\\ -\frac{(x-\mu)^2 - \sigma^2}{\sigma^3} \end{matrix} \right] $$

And the Hessian (check 3, 4 and 5):

$$ \mathbf{H} = \left[ \begin{matrix} \frac{1}{\sigma^2} & \frac{2(x - \mu)}{\sigma^3} \\ \frac{2(x - \mu)}{\sigma^3} & \frac{3(x - \mu)^2 - \sigma^2}{\sigma^4} \end{matrix} \right] $$

Of the Negative Log Likelihood function.

Now, I compute the Hessian of the Negative Log Likelihood function for $N$ observations:

$$ \mathbf{A} = \frac{1}{N}\sum\limits_{i=1}^{N}\mathbf{H} = \left[ \begin{matrix} \frac{1}{\sigma^2} & \frac{2(\bar{x} - \mu)}{\sigma^3} \\ \frac{2(\bar{x} - \mu)}{\sigma^3} & \frac{\frac{3}{N}\sum\limits_{i=1}^{N}(x-\mu)^2 - \sigma^2}{\sigma^4} \end{matrix} \right] $$

If everything is right at this point:

Proving the function is convex is equivalent to prove than the Hessian is semi-positive definite;

Additionally I know that:

A semi-positive definite Matrix must have all its eigenvalues non-negative;
Because the Hessian Matrix is symmetric, all eigenvalues are real;

So if I prove than all eigenvalues are positive real numbers, then I can claim the function is convex. We can also check, as @LinAlg suggested, that both determinant and trace of Matrix $\mathbf{A}$ are positive:

$$ \begin{align} \det(\mathbf{A}) \geq 0 \Leftrightarrow & \frac{3}{N}\sum\limits_{i=1}^{N}(x-\mu)^2 - \sigma^2 - 4(\bar{x} - \mu)^2 \geq 0 \\ \operatorname{tr}(\mathbf{A}) \geq 0 \Leftrightarrow & \frac{3}{N}\sum\limits_{i=1}^{N}(x-\mu)^2 \geq 0 \end{align} $$

It is obvious that $\operatorname{tr}(\mathbf{A}) \geq 0$.

Inequality

The inequality $\det(\mathbf{A}) \geq 0$ is not obvious at the first glance, it requires a bit of algebra. Expanding all squares, applying sum, simplifying and grouping gives:

$$ 3\left[\frac{1}{N}\sum\limits_{i=1}^{N}x^2 - \bar{x}^2\right] -(\bar{x}-\mu)^2 -\sigma^2 \geq 0 $$

Now I can rewrite it using Standard Deviation Estimation:

$$ 3\left[\frac{N-1}{N}s^2_x - \sigma^2\right] + 2\sigma^2 \geq (\bar{x}-\mu)^2 $$

For $N$ sufficiently large it tends to:

$$ 3\left(s^2_x - \sigma^2\right) + 2\sigma^2 \geq (\bar{x}-\mu)^2 $$

Or:

$$ \left|\bar{x}-\mu\right| \leq \sqrt{3s^2_x - \sigma^2} $$

Provided the radicand is non -negative. This last inequality provides a bound for Mean Asbolute Error which must be lower than approximately $\sqrt{2}\sigma$. Finally, if the estimators converge to expected values it reduces to:

$$ \sigma \geq 0 $$

Which is trivially true by definition.

My interpretation of this inequality is:

If we have sufficiently large statistics, drawn from a Normal Distribution, and the Mean and Variance Estimation are close enough to their expected value then the Negative Likelihood Function should be convex. If expected values $\mu$ and $\sigma$ are known, it is possible to assess the convexity.

Questions

Are my reasoning and the interpretation of the result correct?
Can we formally prove that $\det(\mathbf{A}) \geq 0$?

Motivation

This question arose from a numerical example I develop. I am sampling from a Normal Distribution $\mathcal{N}(\mu=2, \sigma=3)$ with $N=1000$ and I would like to emphasize all steps of a Maximum Likelihood Estimation. Then when I visualized the function to minimize I wondered: Can we say that this function is convex? Which made me write this post on MSE.

Your simplification of $A$ is not correct, since you 'abuse' Bias and $\sigma$. The determinant is the product of the eigenvalues and the trace is the sum of the eigenvalues, so it suffices to check if the trace and determinant are positive. — LinAlg, Dec 18 '18 at 15:36
@LinAlg, Thank you for the comment could you be more explicit, what do you mean by abusing the bias and sigma? — jlandercy, Dec 18 '18 at 16:33
sigma and bias do not depend on data but are fixed numbers based on the probility distrubtion — LinAlg, Dec 18 '18 at 16:47
@LinAlg I have updated my post to take your remarks into account. Would you mind review it? I have some difficulty to show that $\det(\mathbf{A})\geq 0$. Thank you. — jlandercy, Dec 18 '18 at 22:56
I have checked your steps and they are correct. For fixed $x$ you get an ellipsoid on which the function is convex. — LinAlg, Dec 18 '18 at 23:05