11

The negative logarithm of the Student-t distribution partial density function is $$f(\nu,x) := -\ln\Gamma\left(\frac{\nu+1}{2}\right) +\ln\Gamma\left(\frac{\nu}{2}\right) +\frac{1}{2}\ln(\pi\nu) +\frac{\nu+1}{2}\ln\left(1+\frac{x^2}\nu\right)$$

How would one prove or disprove there is only one local minimum with respect to $\nu>0$ for any given $x$?

Numerical computation seems to suggest $f(\nu,x)$ strictly decreases with $\nu\in(0,\infty)$ for $x\in[0,1.5]$, and $f(\nu,x)$ is convex in $(0,a)$ and concave in $(a,\infty)$ for some $a>0$ for $x\in (b,\infty)$ for some $b\ge 1.5$.


To facilitate the solution, I post the first and second partial derivative of $f$ as follows.

\begin{align}2\frac{\partial f}{\partial \nu}=\frac{1-x^2}{\nu+x^2}+\ln\Big(1+\frac{x^2}\nu\Big)-\int_0^\infty \frac{e^{-\frac \nu2t}}{1+e^{-\frac t2}}\,dt, \end{align} $$4\frac{\partial^2 f}{\partial \nu^2}=-2\frac{\nu+x^4}{\nu(\nu+x^2)^2}+\int_0^\infty \frac{te^{-\frac \nu2t}}{1+e^{-\frac t2}}\,dt$$

Hans
  • 10,484
  • It seems $f(v, 1)$ is strictly decreasing on $(0, \infty)$? – River Li May 15 '21 at 03:21
  • 1
    @RiverLi: Yes, it seems so judging from numerical computation. It seems $f(\nu,x)$ is strictly decreasing with respect to $\nu\in(0,\infty)$ for $x\in[0,1.5]$. – Hans May 15 '21 at 04:01
  • Would it help to assume $v$ as an integer so that u can reduce the $\Gamma(v/2+1/2)$ and $\Gamma(v/2)$ to algebraic expression and analyze the trend ? –  May 16 '21 at 04:18
  • @Balajisb: Sure. That could be an initial analysis. What is your insight? – Hans May 16 '21 at 07:51
  • I would try to exhibit $\partial f/\partial \nu$ as the Laplace transform of some function that cannot have more than 2 sign changes. – kimchi lover May 19 '21 at 11:52
  • @kimchilover: I posted $\frac{\partial f}{\partial \nu}$ as above. But I have not been able to put it into the form of a Laplace transform. – Hans May 19 '21 at 14:13
  • Use A&S formulas 29.3.8 and 29.3.104 to finish the job of finding the inverse Laplace transform. (Note that in your integral you can factor out $\exp(-\nu t/2)$ and then express what's left as $1/(1+\exp(t/2)$...) – kimchi lover May 19 '21 at 14:28
  • @Hans I have a feeling that $f$ is strictly decreasing for $x<\sqrt{e}$. – K.defaoite May 19 '21 at 15:40
  • You can approximate the derivative of log Gamma extremely well by $\log z-\frac{1}{2z}$. – K.defaoite May 19 '21 at 16:32
  • @kimchilover: Right. I actually thought about that before, but was intimidated by the apparent complexity of the expression. I can actually compute the inverse Laplace transform without looking up any tables. I have write it up below as a draft of an answer. Please check it out. On the other hand, though I understand when the inverse Laplace function has one single sign we can conclude the sign of the Laplace transformed function, it seems there is much to be done for the inverse Laplace function having two signs – Hans May 19 '21 at 18:45
  • 1
    I have no proof, but I have strong numerical evidence that when $$x > \sqrt{1 + \sqrt{2}} \approx 1.5537739740300373073,$$ the log-likelihood transitions from having no critical point to having a unique critical point for $\nu$. – heropup May 19 '21 at 18:52

3 Answers3

5

The following used to contain a mistake, found by Hans. I have found a fix, and have incorporated into the following proof.


Here is a sketch of an argument that the log likelihood function has at most one critical point. It should be read in conjunction with Hans's partial answer.

The idea is to use the "variation diminishing property of the Laplace transform", according to which the Laplace transform of a function with $k$ sign changes cannot have more than $k$ sign-changing zero-crossings. For a function $\phi:\mathbb R^+\to\mathbb R$, let $S(\phi)$ be the maximal $k$ for which there exist $0<x_0<x_1<x_2<\cdots < x_{k}$ for which $\phi(x_i)\phi(x_{i+1})<0$ for all $0\le i < k$. Then the Laplace transform $g(s)=\int_0^\infty e^{-sx} G(x)dx$ of $G$ obeys $S(g)\le S(G)$. This topic is not well explained in Wikipedia articles, but the result used here is in chap V, paragraph 80 in vol 2 of Pólya and Szegő's Problems and Theorems in Analysis (p. 225 in my copy), is discussed at length in Karlin's book Total Positivity (see Theorem 3.1, page 21, and pages 233 and 237), in papers by I.J. Schoenberg, etc. One can think of it as a continuous analogue of Descartes' Rule of Signs. I used it in answering this MSE problem.

If the logarithm of the likelihood function had two or more local maxima, its derivative would have three or more roots, since between every two local maxima lies a local minimum. So it suffices, by the variation diminishing property of the LT, to show that what the OP, in his draft answer, calls $\tilde f$ has at most two sign changes.

This seems evident numerically, but deserves proof just as much as the original problem does. Here is one way of seeing this, using another application of the variation diminishing property of the Laplace transform.

Here the argument. First, I will change notation, using $s$ instead of $t$ and setting $y=x^2$. The claim is that, for fixed real $y\ge0$, $$\tilde f(s) = \frac{1-e^{-ys}}s +(1-y)e^{-ys}-\frac 2{1+e^{-s}}$$ has at most two sign changes as a function of $s\in\mathbb R^+$. Let $g(s)=\dfrac{1+e^{-s}}{s^2}\tilde f(s)$; clearly $g$ has as many sign changes as $\tilde f$ does. But $g$ is itself a Laplace transform: \begin{align}g(s)&=\frac{1+e^{-s}-e^{-ys}-e^{-(y+1)s}}{s^3}+(1-y)\frac{e^{-ys}+e^{-(y+1)s}}{s^2} - \frac2{s^2}\\ &=\int_0^\infty e^{-sx} G(x) dx,\end{align} from which one reads off \begin{align}G(x)&=\frac 1 2\left((x)_+^2 - (x-y)_+^2 +(x-1)_+^2 - (x-(y+1))_+^2\right) \\&+ (1-y)\left((x-y)_++(x-y-1)_+\right)-2x.\end{align} Here $(x)_+=\max(x,0)$. Since $x\mapsto (x)_+$ is continuous, so is $G$. If $y<1$ the function $G$ is piece-wise quadratic on each of the intervals $(0,y)$, $(y,1)$, $(1,y+1)$, $(y+1,\infty)$; if $y>1$ then $G$ is piecewise quadratic on the intervals $(0,1)$, $(1,y)$, $(y,y+1)$, $(y+1,\infty)$, so verification of the lemma is in principle easy in a case-by case manner. In practice, tedious and error prone.

If $y<1$ the formula for $G(x)$ reduces to $$G(x)=\begin{cases} x^2/2 -2x&0\le x<y\\ y^2/2-x-y&y\le x<1\\ x^2/2-2x+(y-1)^2/2&1\le x<1+y\\ y^2-2y-1&1+y\le x\end{cases}$$ and if $y>1$, the formula reduces to $$G(x)=\begin{cases} x^2/2 -2x&0\le x<1\\ x^2-3x+1/2&1\le x<y\\ x^2/2-2x+(y-1)^2/2&y\le x<1+y\\ y^2-2y-1&1+y\le x.\end{cases}$$

These can be merged into the following, where the cases are referred to below: $$ G(x)=\begin{cases} x^2/2 -2x&\text{A: if }0\le x<\min(1,y)\\ y^2/2-x-y&\text{B: if }y\le x< 1\\ x^2-3x+1/2&\text{C: if }1\le x< y\\ x^2/2-2x+(y-1)^2/2&\text{D: if }\max(1,y)\le x< 1+y\\ y^2-2y-1&\text{E: if }1+y< x \end{cases} $$ Note that cases B and C are mutually exclusive. Computations show that for fixed $y$ the function $G(x)$ has at most one sign change; I sketch an argument for this below. (Omitting an analysis of the possibility of sign changes at the case boundaries.)

$G$ has no sign changes in cases A or E (in A, the only possibilities are $x=0$ or $x=4$, the former is not a sign change, and $x=4$ does not obey $0\le x<\min(1,y)$. Constant functions, as in case E, do not have sign changes.) Case B has no sign changes, for the value $x=y^2/2-y$ violates $y<x<1$. In case C, a sign change could only occur at $x=(3\pm\sqrt 7)/2$, and then $1<x<y$ implies $x=(3+\sqrt7)/2$ and $y>(3+\sqrt7)/2$. In case D, a sign change can only occur at $x=2\pm\sqrt{3+2y-y^2}$, and $\max(0,x)<x<y$ is only possible if $x=2+\sqrt{3+2y-y^2}$ and $1+\sqrt 2<y<(3+\sqrt 7)/2$. Putting these together: if $y<1$ then there can be no sign changes in the relevant cases A,B,D,E. If $y>1$ there might be at most one sign change in each of C, D (out of the relevant A,C,D,E), but not actually both, since that would violate $(3+\sqrt7)/2 <y<(3+\sqrt 7)/2$. Hence, $G$ has at most one sign change among A,B,C,D,E.

Finally, since $G(0)=0, G'(0)=-1, G(\infty)=y^2-2y-1$, we see $G$ has exactly one sign change if $y^2-2y-1>0$ and none if $ y^2-2y-1\le0$.

The meta-motivation is to shoehorn the original question into an application of the variation diminishing machinery given in my first paragraphs. The micro-motivation for my choice of $g$ (and hence of $G$) comes from the realization that $\tilde f$ is the Laplace transform of the signed measure $$\mu = \lambda_{[0,y]} + (1-y)\delta_y -2\sum_{k\ge0}(-1)^k \delta_k,$$ where $\lambda_{[0,y]}$ is Lebesgue measure restricted to $[0,y]$ and $\delta_k$ represents the unit point mass at $k$. The signed measure $\mu$ has infinitely many sign changes, but the telescoping series $\mu*(\delta_0+\delta_1)$ does not, where $*$ denotes convolution of measures, so $1+e^{-s}$ times $\tilde f$ is a better candidate for the variation diminishing trick sketched above.

Dividing by a power of $s$ has the effect of smoothing the signed measure, and eliminating some small oscillations that create their own extraneous sign changes. The mistake Hans found in an earlier version of this answer was to divide by $s$, which allowed for 3 sign changes for a certain range of $y$. Dividing by $s^2$ fixed this problem, at the price of making $G$ piecewise quadratic instead of piecewise linear.

kimchi lover
  • 24,981
  • If $1<y<2$, does $G(x)$ not change sign $3$ times rather than $1$ time --- if we do not count reaching value $0$ as sign change? – Hans May 26 '21 at 07:24
  • Yes. For example, according to the definition of $G$, when $y=1.75$, pick $x=(1.2,1.7,1.9,2.5)$, we have $G(1.2)=-0.6, G(1.7)=0.4, G(1.9)=-0.1, G(2.5)=0.5$. That is $G(x_i)G(x_{i+1}<0, \forall i\in{1,2,3}$. The sign changes $3$ times. – Hans May 26 '21 at 12:51
  • Thanks! This is embarrassing. I suppose that dividing by an extra power of $s$ would fix things. – kimchi lover May 26 '21 at 13:31
  • Intriguing! Is there a theory for controlling the number of sign changes by dividing the powers of, say, $s$? Marvelous proof, kimchi lover! +1 and accepted! – Hans May 26 '21 at 17:13
  • 2
    Thanks! If $h(s)$ is the LT of function $H(t)$ then $h(s)/s$ is the LT of the indefinite integral of $H(t)$, so some oscillations in $H(t)$ have a chance of being smoothed out, possibly resulting in fewer sign changes in the $t$ domain. I suppose that's what's happening here. This is more of a heuristic than a theory, though. – kimchi lover May 26 '21 at 18:19
  • @kimchilover It is interesting. (+1) – River Li May 27 '21 at 01:00
  • @Hans I have finished revising my answer, supplying detail, etc. An interesting question, and something of a wonder that it has not been posed and answered in the stats literature long since. – kimchi lover May 27 '21 at 17:23
  • 1
    I really appreciate it. I love your answer, not only supplying a proof, but much more importantly, introducing, at least to me, a novel methodology and a theory. Marvelous! You deserve all the bounty, upvotes and more. I will check out all the details later. – Hans May 27 '21 at 20:25
  • Would you please be so kind as to supply the detail of the reference to vol. 2 of Pólya and Szegő's Problems and Theorems in Analysis? I cannot find it now. – Hans Sep 20 '21 at 21:24
  • 1
    @Hans p.225 in my 1976 paperback edition of P&S v2. – kimchi lover Sep 21 '21 at 12:21
1

As suggested by @kimchilover, I write \begin{align}2\frac{\partial f}{\partial \nu} &=\frac{1-x^2}{\nu+x^2}+\ln\Big(1+\frac{x^2}\nu\Big)-\int_0^\infty \frac{e^{-\frac \nu2t}}{1+e^{-\frac t2}}\,dt \\ &= \int_0^\infty \tilde f(t,x)e^{-\nu t}dt, \end{align} where $$\tilde f(t,x):=\frac{1-e^{-x^2t}}t+(1-x^2)e^{-x^2t}-\frac2{1+e^{-t}}=\frac1t+\Big(-\frac1t+1-x^2\Big)e^{-x^2t}-\frac2{1+e^{-t}}.$$ However, it seems more work is needed for when $\tilde f(t,x)$ has two signs with respect to $t$ for given large $x$.

Again running with @kimchilover's idea, $h(t,y):=(1+e^{-t})\tilde f(t,y).$ Then $$\tilde g(s,y)=$$

Hans
  • 10,484
0

Just some thoughts (I will add proofs in the future, if possible.)

For convenience, we replace $x^2$ with $y$.

We have $$ \frac{\partial f}{\partial v} = \frac{1 - y}{2v + 2y} + \frac12 \ln \left(1 + \frac{y}{v}\right) - \frac12 \psi\left(\frac{v + 1}{2}\right) + \frac12\psi\left(\frac{v}{2}\right) $$ where $\psi(\cdot)$ is the digamma function defined by $\psi(u) = \frac{\mathrm{d} \ln \Gamma(u)}{\mathrm{d} u} = \frac{\Gamma'(u)}{\Gamma(u)}$.

(i) If $0 < y \le 1$, then \begin{align*} \frac{\partial f}{\partial v} &\le \frac{1 - y}{2v} + \frac12 \cdot \frac{y}{v} - \frac12 \left(\ln \frac{v + 1}{2} - \frac{2}{v + 1}\right) + \frac12\left(\ln \frac{v}{2} - \frac{1}{v}\right)\\ &= \frac{1}{v + 1} - \frac12\ln\left(1 + \frac{1}{v}\right)\\ &< 0 \end{align*} where we have used $\ln z - \frac{1}{z} \le \psi(z) \le \ln z - \frac{1}{2z}$ for all $z > 0$, and $\ln(1 + u) \le u$ for all $u \ge 0$. Note: The last inequality is easy to prove by taking derivative.

(ii) If $1 < y \le 1 + \sqrt{2}$, then \begin{align*} \frac{\partial f}{\partial v} &\le \frac{1 - (1 + \sqrt2)}{2v + 2(1 + \sqrt2)} + \frac12 \ln \left(1 + \frac{1 + \sqrt2}{v}\right)\\ &\quad - \frac12\left(\ln \frac{v + 1}{2} - \frac{1}{v + 1} - \frac{1}{3(v + 1)^2}\right)\\ &\quad + \frac12\left(\ln \frac{v}{2} - \frac{1}{v} - \frac{1}{12(v/2 + 1/14)^2}\right)\\ &< 0 \end{align*} where we have used $\ln u - \frac{1}{2u} - \frac{1}{12u^2} < \psi(u) < \ln u - \frac{1}{2u} - \frac{1}{12(u + 1/14)^2}$ for all $u > 0$ (Theorem 5, [1]), and $y \mapsto \frac{1 - y}{2v + 2y} + \frac12 \ln \left(1 + \frac{y}{v}\right)$ is strictly increasing on $(1, \infty)$. Note: The last inequality is easy to prove by taking derivative.

Remark: Where does $1 + \sqrt2$ come from? We rewrite $\frac{\partial f}{\partial v}$ as $$\frac{\partial f}{\partial v} = \frac{v \mathrm{e}^{A}}{2(v + y)} \left\{\left(1 + \frac{y}{v}\right)\mathrm{e}^{-A} \ln \left[\left(1 + \frac{y}{v}\right)\mathrm{e}^{-A}\right] + \left(1 + \frac{1}{v}\right)\mathrm{e}^{-A}\right\}$$ where $A = 1 + \psi\left(\frac{v + 1}{2}\right) - \psi\left(\frac{v}{2}\right)$. From $\frac{\partial f}{\partial v} = 0$, we solve $y = {{\mathrm e}^{{\mathrm{LambertW}}\left( - (1 + 1/v)\mathrm{e}^{-A} \right) +A}}\,v - v$ where $W(\cdot)$ is the Lambert W function. Maple tells us $\lim_{v\to \infty} \left( {{\mathrm e}^{{\mathrm{LambertW}}\left( - (1 + 1/v)\mathrm{e}^{-A} \right) +A}}\,v - v\right) = 1 + \sqrt{2}$. By the way, @heropup also pointed out in comment that the point $x = \sqrt{1 + \sqrt{2}}$ is a demarcation point.

(iii) If $y > 1 + \sqrt{2}$ is fixed, we claim that $f$ has exactly one global minimizer $v^\ast$ on $(0, \infty)$, furthermore, $f$ is strictly decreasing on $(0, v^\ast)$ and strictly increasing on $(v^\ast, \infty)$.

We can prove the claim if the following conjecture is true:

Conjecture 1: $Q < 0$ for all $v > 0$, where $$Q = - 4 - v(v + 2)^2 \int_0^\infty \frac{t^2\mathrm{e}^{-vt/2}}{1 + \mathrm{e}^{-t/2}}\mathrm{d} t + (6v^2 + 16v + 8)\int_0^\infty \frac{t\mathrm{e}^{-vt/2}}{1 + \mathrm{e}^{-t/2}}\mathrm{d} t.$$ Numerical evidence shows its truth. However, we have not yet proved it.

References

[1] L. Gordon, “A stochastic approach to the gamma function”, Amer. Math. Monthly, 9(101), 1994, 858-865.

River Li
  • 49,125
  • How do you show the last inequality involving the integral right below (ii)? What is the definition of $H$? – Hans May 25 '21 at 05:47
  • For $H$, you can see https://mathworld.wolfram.com/LambertW-Function.html. The solution for the equation $z \mathrm{e}^z = c$ is given by $z = W(c)$ where $W$ is the Lambert W function. – River Li May 25 '21 at 05:51
  • Are you using $h$ on that page to define $H$? The symbols are different. I am just saying you did not define $H$ in your answer. If $H$ is indeed $h$, it seems unlikely $\lim_{v\to \infty} H(v) = 1 + \sqrt{2}$. – Hans May 25 '21 at 06:07
  • @Hans I just use $H$ to denote the expression of solution $y$ which is irrelevant to the page. OK. I give you the expression of $H(v)$. It is $$H(v) = {{\rm e}^{{\mathrm {LambertW}} \left( -{\frac { \left( v+1 \right) {{\rm e}^ {-B-1}}}{v}} \right) +B+1}}v-v$$ where $B = \int_0^\infty \frac{\mathrm{e}^{-vt/2}}{1 + \mathrm{e}^{-t/2}}\mathrm{d} t$. – River Li May 25 '21 at 06:28
  • It was not me who down-voted you, in case anyone is wondering. I applaud your work, notwithstanding the incompleteness. I await you to finish your proof and capture the ensuing bounty. – Hans May 27 '21 at 20:29
  • @Hans Thanks. I hope I can prove it soon. – River Li May 27 '21 at 23:38