2

I am currently looking into the Maximum Likelihood Estimate (MLE) for the mean $\mu$ and $\sigma^2$ of a Gaussian distribution $\mathcal{N}(\mu, \sigma^2)$ for a given set of samples

$$\left\{x_i \ \vert\ x_i \in \mathbb{R}, i = 1, \ldots, n \right\}.$$

The MLE is given by

\begin{align*} \mu^* &= \frac{1}{n} \sum_{i=1}^n x_i \\ \sigma^{2^*} &= \frac{1}{n} \sum_{i=1}^n (x_i - \mu^*)^2 \end{align*}

and is indeed the global maximizer, as shown in this thread. It is however easy to show that jointly maximizing the likelihood w.r.t. $\mu$ and $\sigma^2$ is non-concave, as done in this thread. More precisely, this thread looks at the equal objective of minimizing the negative log-likelihood and it shows that the negative log-likelihood is non-convex.

I have already shown in the very same thread, that we can find a non-convex set $G$ on which the negative log-likelihood is convex and that contains the MLE.

Now my question: Why are there no other minimizers of the negative log-likelihood outside of the set $G$ that we can find via the condition

$$\nabla_{\mu,\sigma^2} - \log\left(\prod_{i=1}^n \mathcal{N}(x_i \vert \mu, \sigma^2)\right) = 0\quad?$$

More generally, is it possible that a function is non-convex on $\mathbb{R}^n$ but has only one minimizer (or a set of minimizers with the same function value)?

htipk
  • 97

1 Answers1

2

It is key that the (location of the) maximum over $\mu$ does not depend on $\sigma$. For that optimal value of $\mu$, there is again a unique optimal $\sigma$. Therefore, you can just optimize sequentially and you do not need joint concavity.

LinAlg
  • 20,093
  • Thanks for the answer! But despite of being able to optimize the function although it is non convex, shouldn't there exist other maxima as well outside of the set G that I mentioned? Interesting side-fact: when fixing mu to the sample mean, the MLE objective becomes convex w.r.t. sigma in all of $\mathbb{R}_{>0}$ – htipk Apr 18 '20 at 20:32
  • @pklink no, as there is no other local optimum for which $\mu$ does not equal the global optimum, and for that $\mu$, there is also no optimum for which $\sigma$ does not equal the global optimum. – LinAlg Apr 18 '20 at 20:33
  • Ah I see! So in this case, the convexity w.r.t. mu regardless of sigma "saved the day". Do you know about any theory on these type of problems? This would be very insightful. – htipk Apr 18 '20 at 20:41
  • @pklink convexity w.r.t. mu regardless of sigma by itself is not enough, e.g., $f(x,y) = (x-y)^2$ does not have a unique optimum even though it is convex w.r.t. $x$ and w.r.t. $y$. – LinAlg Apr 18 '20 at 20:50