Derivative of expected log likelihood in a logistic regression model

Question

Consider the univariate logistic regression model: $$ P(Y = 1\mid X = x) = \psi(x\beta_0)\equiv \frac 1 {1+\exp\{-x \beta_0\}},\quad\text{for all $x$, and some unknown $\beta_0\in\mathbb{R}$.} $$ Assume that, $X$ has a finite positive variance and marginal distribution $Q(x)$. The score function based on one sample $(Y,X)$ is, $$ Z(\beta :Y,X) = X\cdot\big\{Y-\psi(X\beta)\big\}. $$ The expected log-likelihood based on one sample is $$ M(\beta)\equiv \mathbf{E}\left[Y\log\psi(X\beta) + (1-Y)\log\big\{1-\psi(X\beta) \big\} \right], $$ where, $\mathbf{E}$ denotes expectation under the true joint distribution of $(Y,X)$ under the parameter $\beta_0$.

My questions are:

(i) How to show that $M(\beta)$ is finite for all $\beta$.

(ii) Is, $M^\prime(\beta)=\mathbf{E}\big(Z(\beta:Y,X)\big)$, for all $\beta$. If so, then what are the required conditions.

Zack Fisher · Answer 1 · 2024-08-24T07:26:19.557

For question (i), $M(\beta)$ can be evaluated by first integrating out $Y$. Denote the log-likelihood by $\ell(\beta;X,Y)=Y\log[\psi(\eta)]+(1-Y)\log[1-\psi(\eta)]$ where $\eta=X\beta$. Then, $$\begin{align} M(\beta)&=\mathbb{E}\left\lbrace \mathbb{E}\left[\ell(\beta;X,Y) \mid X\right]\right\rbrace \\ & =\mathbb{E}\left\lbrace \psi(\eta) \log\left[\psi(\eta)\right] + [1-\psi(\eta)]\log\left[1-\psi(\eta)\right] \right\rbrace \\ &=\mathbb{E}\left\lbrace (1-\psi[\eta])\eta \right\rbrace - \mathbb{E}\left\lbrace \log\left[1+\exp(\eta)\right] \right\rbrace . \end{align}$$ Notice that, since $0<\psi(\eta)<1$, $$\left\vert (1-\psi[\eta])\eta \right\vert \leq \left\vert \eta \right\vert ,$$ and $$0< \log\left[1+\exp(\eta)\right] \leq \log(2)+\frac{\eta}2+\frac{\eta^2}8.$$ But $\eta=X\beta$ and by assumption of $0<\mathbb{V}(X)<\infty$, we have $\mathbb{E}\left(\eta^2\right)<\infty$. So both terms in $M(\beta)$ are finite.
Question (ii) is equivalent to whether we can exchange the order of differentiation with integration (expectation), i.e., whether $$ \frac{\text{d}}{\text{d}\beta} \mathbb{E}[\ell(\beta;X,Y)] \stackrel{?}{=} \mathbb{E}\left[ \frac{\text{d}}{\text{d}\beta} \ell(\beta;X,Y) \right]. $$This can verified by dominated convergence. A sufficient condition is that the absolute score is bounded by an integrable function. For any $\beta$, since $0<Y-\psi(X\beta)<1$, we have $$ \left\vert \frac{\text{d}}{\text{d}\beta} \ell(\beta;X,Y) \right\vert =\left\vert X\lbrace Y-\psi(X\beta)\rbrace \right\vert = |X|\cdot | Y-\psi(X\beta)| \leq |X| . $$But $\mathbb{E}\left\lbrace |X|\right\rbrace <\infty$ by assumptions on moments of $X$. Therefore, the exchange of the order of differentiation with expectation is justified.

cusat15 · Answer 2 · 2017-04-17T16:20:08.620

There was a slight error in part (ii) of my question. I have made the change: $M(\beta)$ should be been $M^\prime(\beta)$, it's derivative.

Here is my own approach: please provide your comments.

For all $(y,x)\in\{0,1\}\times \mathcal{X}$, it is easy to check that the map, $\beta\mapsto y\log{\psi(x\beta)}+(1-y)\log{\{1-\psi(x\beta)\}}$, is strictly concave, where $\mathcal{X}$ is the sample space for $X$. I am assuming $\beta\in\mathbb{R}$. Let, $P_0$ denote the joint d.f. of $(Y,X)$ under $\beta_0$. As $Q(x)$ is non-degenerate, $P_0$ will be non-degenerate and hence, $$ \beta\mapsto M(\beta) = \int \left[y\log{\psi(x\beta)}+(1-y)\log{\{1-\psi(x\beta)\}}\right]~dP_0(y,x), $$ will also be strictly concave(?). Now, $M(\beta)\leq 0$, for all $\beta$, since $\psi(\cdot)\in (0,1)$.

My doubt: Is it possible that a strictly concave map defined on the real line can take the value $-\infty$, at some point $u\in \mathbb{R}$? I need some help on this.

Now I assume: $M(\beta)>{}-\infty$, for all $\beta$. So, at any $\beta\in\mathbb{R}$, and any $a_n\rightarrow 0$, we have, \begin{align*} &\lim_{a_n\rightarrow 0} \frac{M(\beta + a_n) - M(\beta)}{a_n} \\ &=\lim_{a_n\rightarrow 0} \int y\frac{[\log{\psi(x\beta + x a_n)}-\log{\psi(x\beta)}]}{a_n}~dP_0\\ &\ {}+\lim_{a_n\rightarrow 0} \int (1-y)\frac{\big[\log{\{1-\psi(x\beta + x a_n)\}}-\log{\{1-\psi(x\beta)\}}\big]}{a_n}~dP_0 \end{align*} Consider the first term and write, $g_1(u) = \log{\psi(u)}$. Using Mean value theorem, at each fixed $(y,x)$, we have $$ g_1(x\beta + xa_n) - g_1(x\beta) = (xa_n)\cdot g^\prime_1(x\beta + \eta_{x,n} (xa_n)),\quad\text{for some $\eta_{x,n}\in (0,1)$.} $$ Here, $g^\prime_1(u) = 1- \psi(u)$. Hence, the first term on the rhs becomes, \begin{align*} \lim_{a_n\rightarrow 0} \int yx\big\{1 - \psi(x\beta + \eta_{x,n}(xa_n))\big\}~dP_0 = \lim_{n\rightarrow\infty}\int g_n(y,x)~dP_0, \end{align*} where, $g_n(y,x)$ is the function inside the integral. Note, $|g_n(y,x)|\leq 2|x|$, which is $Q(\cdot)$-integrable, by assumption. Also, for each $(y,x)$, $g_n(y,x)\rightarrow yx\{1-\psi(x\beta)\}$. Now, using Dominated convergence theorem, the first term on the rhs converges to $\int x y\{1-\psi(x\beta)\}~dP_0$. Similarly, the second term converges to $\int x (1-y)\{-\psi(x\beta)\}~dP_0$. Eventually, we obtain $M^\prime(\beta) = \mathbf{E}\big(Z(\beta:Y,X)\big)$.

My doubt: The way I have used Mean value theorem inside integral, at each $(y,x)$. Is it rigorous, or have I missed anything.

I will highly appreaciate your comments.

Derivative of expected log likelihood in a logistic regression model

2 Answers2