Is the expected number of primes in a specific interval $[p_n^2,p_{n+1}^2]$ approximately $p_n$?

Question

The expected (average?) number of primes in the interval $[p_n^2,p_{n+1}^2]$ is approximately $p_n$.

While thinking about a completely different problem, I noticed the above relationship, which I regard as very pretty. I suspect that the relationship is well known, although I have found no mention of it in a cursory search of references.

My question: I want to confirm that the reasoning that led me to this observation is valid.

My reasoning: For sufficiently large $p_n$, the expected gap $p_{n+1}-p_n \approx \ln{p_n}$. Hence, $p_{n+1} \approx p_n+\ln{p_n} \Rightarrow p_{n+1}^2 \approx p_n^2+(2\ln{p_n})\cdot p_n+(\ln{p_n})^2$. The interval between the two squares would have the size $p_{n+1}^2-p_n^2 \approx (2\ln{p_n})\cdot p_n$. The average difference between primes at numbers of the magnitude $p_n^2$ is just $\ln{p_n^2}=2\ln{p_n}$. Ergo, the expected number of primes in the interval is $\approx p_n$.

Of course, this is a very general statement, not a hard and fast rule. It would not apply, for example, when $p_n,p_{n+1}$ are twin primes, or otherwise unusually close together. By the same token, there would be occasional gaps between consecutive primes of unusually large size that would engender intervals between their primes containing a comparative excess of primes. I just want to know if the general relationship is a valid inference.

Your title ("The number of primes...is approximately $p_n$") and your first sentence ("The expected (average?) number of primes...is approximately $p_n$") are not at all the same thing. The statement in the title is certainly wrong, whatever you mean by "approximately". — TonyK, Apr 15 '20 at 17:34

Gary · Accepted Answer · 2020-04-18T07:39:55.203

Saying that the average number of primes in the intervall $\left[ p_{n}^2, p_{n+ 1}^2 \right]$ is $p_n$ should mean that $$ \frac{1}{n}\sum\limits_{k = 1}^n {(\pi (p_{k + 1}^2 ) - \pi (p_k^2 ))} \approx p_n . $$ However, by the prime number theorem $$ \pi (x) = \frac{x}{{\log x}}\left( {1 + \mathcal{O}\!\left( {\frac{1}{{\log x}}} \right)} \right), $$ $$ p_{n + 1} ,p_n = n\log n\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right), $$ $$ \log p_{n + 1} ,\log p_n = \log n\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right), $$ whence \begin{align*} \frac{1}{n}\sum\limits_{k = 1}^n {(\pi (p_{k + 1}^2 ) - \pi (p_k^2 ))} & = \frac{1}{n}(\pi (p_{n + 1}^2 ) - \pi (4)) = \frac{{p_{n + 1}^2 }}{{2n\log p_{n + 1} }}\left( {1 + \mathcal{O}\!\left( {\frac{1}{{\log n}}} \right)} \right) \\ & = \frac{{p_n }}{2}\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right). \end{align*} Thus, the average is half of what you expected.

Addendum: For the different average, as a lower bound, we have \begin{align*} & \sum\limits_{k = 1}^n {\frac{{\pi (p_{k + 1}^2 ) - \pi (p_k^2 )}}{{p_k }}} = \frac{{\pi (p_{n + 1}^2 )}}{{p_n }} - \frac{{\pi (4)}}{2} - \sum\limits_{k = 1}^{n - 1} {\pi (p_{k + 1}^2 )\left( {\frac{1}{{p_{k + 1} }} - \frac{1}{{p_k }}} \right)} \\ & = \frac{{\pi (p_{n + 1}^2 )}}{{p_n }} - \frac{{\pi (4)}}{2} + \sum\limits_{k = 1}^{n - 1} {\frac{{\pi (p_{k + 1}^2 )}}{{p_{k + 1} p_k }}(p_{k + 1} - p_k )} \\ & = \frac{n}{2}\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right) + \sum\limits_{k = 1}^{n - 1} {\frac{{\pi (p_{k + 1}^2 )}}{{p_{k + 1} p_k }}(p_{k + 1} - p_k )} \\ & \ge \frac{n}{2}\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right) + \sum\limits_{k = 1}^{n - 1} {\frac{1}{{p_{k + 1} p_k}}\frac{{p^2_{k + 1} }}{{\log p_{k + 1}^2 }}(p_{k + 1} - p_k )} \\ & \ge \frac{n}{2}\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right) + \sum\limits_{k = 1}^{n - 1} {\frac{1}{{p_{k + 1} p_k }}\frac{{p_{k + 1} p_k }}{{\log p_{k + 1}^2 }}(p_{k + 1} - p_k )} \\ & = \frac{n}{2}\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right) + \frac{1}{2}\sum\limits_{k = 1}^{n - 1} {\frac{1}{{\log p_{k + 1} }}(p_{k + 1} - p_k )} \\ & \ge \frac{n}{2}\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right) + \frac{1}{2}\frac{1}{{\log p_n }}\sum\limits_{k = 1}^{n - 1} {(p_{k + 1} - p_k )} \\ & = n\left( {1 + \mathcal{O}\!\left( {\frac{{\log \log n}}{{\log n}}} \right)} \right). \end{align*} Here, I used summation by parts, the above asymptotics, and the lower bound $\pi(x)\geq x/\log x$ ($x\geq 1$). Thus $$ \mathop {\lim }\limits_{n \to + \infty } \frac{1}{n}\sum\limits_{k = 1}^n {\frac{{\pi (p_{k + 1}^2 ) - \pi (p_k^2 )}}{{p_k }}} \ge 1. $$ We still need the corresponding upper bound to prove your claim.

$\frac{1}{n}\sum_{k=1}^n{(\pi (p_{k+1}^2 )-\pi (p_k^2))}$ averages the number of primes in each of $n$ intervals. I meant to ask about the number of primes in the single interval where $k=n$; apologies for any unclarity. For $k<n$, most intervals will be smaller and contain fewer primes than the interval where $k=n$, so the expected number of primes in that terminal interval (I think $p_n$) should be greater than the average number of primes in all intervals (you find $\frac{p_n}{2}$). If my original reasoning is wrong, I would like to understand why. — Keith Backman, Apr 17 '20 at 18:53
I believe, this is the best you can say. You cannot say anything about one particular interval. Being a bigger interval does not mean that it has more primes. For example, there are 13 primes between 1900 and 2000, but there are 15 between 0 and 50. In fact, the primes become less and less common on average. If you compute $\pi (p_{n + 1}^2 ) - \pi (p_n^2 )$ for the first say 50 integers $n$, say, you will see that there is a huge fluctuation. It is the average value that behaves nicely ($\sim p_n/2$). — Gary, Apr 17 '20 at 20:00
The asymptotics for the $n$th prime does not say anything concrete about the size of $p_{n + 1}^2 - p_n^2$. If it did, it would say something about $p_{n + 1} - p_n$ which we know is not possible (it behaves very irregularly). Thus you cannot derive simply asymptotics for $\pi (p_{n + 1}^2 ) - \pi (p_n^2 )$ because the arguments are not distinguishable by simple asymptotics. — Gary, Apr 17 '20 at 20:04
I computed $\frac{1}{n\cdot p_n}\sum_{k=1}^n{(\pi (p_{k+1}^2)-\pi(p_k^2))}$ for $n$ ranging between $2$ and $80$, and above $n=20$ it remains very close to $0.5$. This agrees with your answer. I also computed $\frac{(\pi (p_{k+1}^2)-\pi(p_k^2))}{p_k}$ for the same range. Those numbers do fluctuate greatly, but the average of the first $80$ such ratios is very close to $1$. This agrees with my understanding as proposed. I will extend my calculations when I have time, but to this point the actual calculated numbers seem to agree with my proposal. — Keith Backman, Apr 18 '20 at 03:00
@Gary The method is very similar with mine. Yours have stronger error estimates. This is due to the error term in Prime Number Theorem. Of course, if we use stronger error term, then we obtain stronger error term in this one too. — Sungjin Kim, Apr 18 '20 at 03:29
@Gary Thanks for your addendum. I was about to add something similar as an edit to my post, but you did an even better job than I would have. Just one further observation: In the original way that you and Sungjin Kim approached the question, you merely found the total number of primes $<p_{k+1}^2$ and noticed that there were $n$ intervals into which they were assorted. The nature of the intervals became irrelevant. You can assort that number of primes into any $n$ intervals or sets according to any arbitrary rule and the average number of primes per interval or set will be $\frac{p_n}{2}$. — Keith Backman, Apr 18 '20 at 15:00
In our version with Sungjin Kim, it was essential that for the endpoint $a$ of the last interval $\frac{{\pi (a)}}{n} \approx \frac{{p_n }}{2}$, i.e., $\frac{a}{{\log a}} \approx \frac{{n^2 \log n}}{2}$. So, $a$ should be about $n^2\log^2 n$ (which is true if $a=p^2_{n+1}$). — Gary, Apr 18 '20 at 15:39

score 0 · Answer 2 · answered Apr 15 '20 at 23:23

0

We apply the Prime Number Theorem of the form $$ \pi(x)=\sum_{p\leq x} 1 \sim \frac x{\log x}, and $$ the $n$-th prime $p_n$ satisfies $p_n\sim n \log n$.

The average of prime counting function on the interval $(p_n, p_{n+1}]$ can be written as $$ \frac1{\pi(x)}\sum_{p\leq x} (\pi(p^2)-\pi((p')^2)) $$ where $p'$ is the previous prime to $p$.

This sum is a telescoping sum and hence the sum is in fact, $$ \frac1{\pi(x)}\pi(p_{\pi(x)}^2). $$ Applying Prime Number Theorem, the above is asymptotic to $$ \frac{\log x}x \frac{p_{\pi(x)}^2}{\log(p_{\pi(x)}^2)}\sim \frac{p_{\pi(x)}}{2}. $$ Hence, your average with $n=\pi(x)$ is $\sim p_n/2$.

answered Apr 15 '20 at 23:23

Sungjin Kim

20,850

Very nice answer, but I think you answered a slightly different question than I intended to ask; perhaps that is my fault for using the word "average" inappropriately. You have determined the total count of primes and divided it by the number of intervals, obtaining an average number of primes per interval over all intervals. I meant to ask, for one single interval defined by $[p_{n+1}^2-p_n^2]$, is the expected (typical, average) number of primes in that particular interval approximately $p_n$? Especially, I wished to know if the reasoning behind my conclusion is valid. – Keith Backman Apr 16 '20 at 03:00

Is the expected number of primes in a specific interval $[p_n^2,p_{n+1}^2]$ approximately $p_n$?

2 Answers2

Linked