10

I guess the following inequality

$$\sum_{i=1}^n g (p_i) \ge \sum_{i=1}^n g \left (\frac{-p_i \log p_i}{H(\boldsymbol{p})} \right )$$

holds for any continuous convex function $g$ and any probability vector $\boldsymbol{p}=(p_1,\dots,p_n)\ge 0$ with $\sum_{i=1}^np_i=1$ where $H(\boldsymbol{p})=-\sum_{i=1}^n p_i\log p_i $ denotes the Shannon entropy of the probability distrbution $\boldsymbol{p}$. This is equivalent to that the following majorization relation holds

$$ -\boldsymbol{p}\log \boldsymbol{p} \prec H(\boldsymbol{p})\boldsymbol{p}.$$

There are many other equivalent conditions for a majorization relation. You can see page 14 of this monograph on majorization for a summary (page 45 of the pdf). Hence, proving a majorization relation have many interesting and non-trivial consequences. It is why I think assessing the relation is important because it cannot be simply obtained using procedures known for generating majorization relations as far as I could check (see Section 5 of the book).

In a part of my answer to this question, you can find a proof for the case where all probabilities are less than $e^{-1}$, where it is equivalently shown that $\boldsymbol{p}$ majorizes the normalized vector $\small \frac{-\boldsymbol{p}\log \boldsymbol{p}}{H(\boldsymbol{p})}$.

I verified the above conjecture for $\color{green}{n=2}$ by finding the doubly stochastic matrix that salsifies the following equation (existence of such a matrix is equivalent to the above majorization):

$$ \begin{bmatrix} -p_1\log p_1 \\ -p_2\log p_2 \end{bmatrix}= \begin{bmatrix} x & 1-x \\ 1-x & x \end{bmatrix} \begin{bmatrix} p_1\left (-p_1\log p_1-p_2\log p_2 \right) \\ p_2\left (-p_1\log p_1-p_2\log p_2 \right) \end{bmatrix} $$

where $x$ is given by ($p_2=1-p_1$):

$$x=\frac{p_1\log p_1}{\left(2p_1-1\right)\left(p_1\log x+\left(1-p_1\right)\log\left(1-p_1\right)\right)}-\frac{1-p_1}{\left(2p_1-1\right)},$$

which is always in $[0,1]$ for any $p_1 \in [0,1]$ (source-2).

Update 1:

  • In an answer below, I proved a related weaker result.

  • To reach a proof or a counterexample, one should notice when all probabilities are less than $e^{-1}$, the claim holds. Thus, there two remaining cases that should be examined where one or two of the probabilities are greater than $e^{-1}$ (as the probabilities sum to $1$).

  • I also verified the conjecture for $\color{green}{n=3,4}$ by numerically checking the following equivalent condition (source-3, source -4):

$$\sum_{i=1}^n \max \left (p_i-C, 0 \right ) \ge \sum_{i=1}^n \max \left (\frac{-p_i \log p_i}{H(\boldsymbol{p})}-C, 0 \right ), C \in \mathbb R. $$

Amir
  • 11,124
  • 1
    The original question has lead to a lot of interesting discussion! I am very interested in seeing a general proof after seeing all of the partial results so far – whpowell96 Aug 08 '24 at 10:23
  • @whpowell96 I hope so as I just verified it for $n=3,4$. This is a nice and non-trivial claim that I really like, but it has so far resisted full proof :-) – Amir Aug 08 '24 at 10:56
  • The bounty has expired and no progress has been made in solving the problem, so I posted the problem in this MO question. – Amir Aug 16 '24 at 09:10

1 Answers1

3

Here I establish a proof for

$$\frac{1}{2}\left( \boldsymbol{p}-\frac{\boldsymbol{p} \log \boldsymbol{p}}{H(\boldsymbol{p})} \right ) \prec \boldsymbol{p} \tag{1}$$

First consider the following lemma:

Lemma Let $\boldsymbol{u} \ge 0$ and $f: \mathbb R_{\ge 0} \to \mathbb R_{\ge 0}$ be a function for which $\frac{f(x)}{x}$ is decreasing and $f$ is increasing on the set $\{u_1,\dots,u_n \}$ where $ \sum_{i=1}^n u_i>0$ and $ \sum_{i=1}^n f(u_i)>0$, $\frac{\boldsymbol{u}}{\sum_{i=1}^n u_i}$ majorizes vector $\frac{(f (u_1),\dots,f (u_n))}{\sum_{i=1}^n f(u_i)}$.

It is a slightly stronger version of F3 that is proved in my answer [1], for which a similar proof can be obtained.

To use the above lemma, let us define $f(x)=x-\frac{x \log x}{H(\boldsymbol{p})}$ with $H(\boldsymbol{p})=-\sum_{i=1}^n p_i\log p_i$. Indeed, $\frac{f(x)}{x}$ is a decreasing function, and $f$ is also increasing on the set $\{p_1,\dots,p_n \}$, as proven in the last part of @fedja's answer [2]. Therefore, from the above lemma it follows that $\boldsymbol{p}=(p_1,\dots,p_n)$ majorizes the vector

$$\frac{1}{2}\left (p_1-\frac{p_1 \log p_1 }{H(\boldsymbol{p})},\dots,p_n-\frac{p_n \log p_n }{H(\boldsymbol{p})}\right)=\frac{p}{2}-\frac{\boldsymbol{p} \log \boldsymbol{p}}{2H(\boldsymbol{p})}.$$


From (1) we cannot immediately infer that $$-\frac{\boldsymbol{p} \log \boldsymbol{p}}{H(\boldsymbol{p})} \prec \boldsymbol{p} \tag{2}$$ unless all $p_i,i=1,\dots,n$ are less than $e^{-1}$. Indeed, since $\frac{1}{2}f$ is an increasing function, the order of the $i$th elements in both vectors $\frac{\boldsymbol{p}}{2}-\frac{\boldsymbol{p} \log \boldsymbol{p}}{2H(\boldsymbol{p})}$ and $\boldsymbol{p}$ are the same for any $i\in[n]$. Hence, denoting $p_{[1]}\ge \dots \ge p_{[n]}$, from the definition of majorization we obtain

$$ \sum_{i\le j}p_{[i]} \ge \sum_{i\le j} -\frac{p_{[i]} \log p_{[i]}}{H(\boldsymbol{p})}, j=1,\dots,n-1,$$

which implies $-\frac{\boldsymbol{p} \log \boldsymbol{p}}{H(\boldsymbol{p})} \prec \boldsymbol{p}$ when $-x\log x$ is increasing on $\{p_1,\dots,p_n \}$, i.e., when all probabilities are less than $e^{-1};$ otherwise, $-\frac{p_{[i]} \log p_{[i]}}{H(\boldsymbol{p})}$ may not be the $i$th largest element in the vector $-\frac{\boldsymbol{p} \log \boldsymbol{p}}{H(\boldsymbol{p})}$.

Hence, from (1), I couldn't get something stronger than what I obtained earlier in my answer [1] using a different method. In fact, from $-\frac{\boldsymbol{p} \log \boldsymbol{p}}{H(\boldsymbol{p})} \prec \boldsymbol{p}$ we can obtain (1); however, the convers may be not true.

Amir
  • 11,124
  • 1
    I agree with everything up to the last passage. It is true that $p$ majorizes $\frac 12(p-\frac{p\log p}{H(p)})$ and that if $p_i$ are arranged in the decreasing order, it follows that $\sum_{i\le k}p_i\ge \sum_{i\le k}\frac{-p_i\log p_i}{H(p)}$, but the terms on the RHS are not necessarily in the decreasing order now. So, to make the final step to majorization, you need more work (if it is possible at all). – fedja Aug 08 '24 at 02:06
  • 1
    @fedja Thanks! I just edited both answer and OP. As there are only two cases that are not covered, i.e., when one or two probabilities are greater than $e^{−1}$, you may have some ideas to design a proof or a counterexample. I also verified the claim based on an equivalent condition for $n=3, 4$. I am wondering whether your proof given for $f(x)^2$ in your answer to my previous question can be adjusted for $\max(f(x)-C,0)$ or not. – Amir Aug 08 '24 at 10:47