Summing the totient function $\sum_{k=1}^n \varphi(k)$

Question

I explored some computer science/number theory challenges sites for fun, and they presented the following problem, exactly as follows:

Let $$P(n) = \sum_{k=1}^n \varphi(k)$$

Find $P(10^{16})$

I searched for quite a while about this and tried different approaches:

Using the formula for $$\varphi(n)= n \cdot \prod_{i=1}^k \frac{p_i-1}{p_i}$$ I tried to calculate each $\varphi(n)$ in range, but this becomes very inefficient for large $n$. I could get as far as $10^7$ with this approach. Beyond this it just gets too slow.
I tried a different one, more direct. Wikipedia and Wolfram Alpha suggest similiar formulas for directly calculating $P(n)$: $$P(n) = \sum_{k=1}^n \varphi(k)= \frac12 \cdot \biggl (1+ \sum_{k=1}^n\mu (k)\cdot \lfloor {\frac nk} \rfloor^2\biggl)$$ This formula seemed a lot more promising. I tried it and managed to get alot further than $10^7$ but still far from the target. With pre-calculating a sieve of the Moebius function, I could get to a bit less than $10^9$. My memory was insufficient, and couldn't compute anymore values in a sieve. And even if I could, it still takes a long time and is very far from $10^{16}$.

Here is part of the code that I used for my 2nd approach written in Java:

public static BigInteger PhiSummatoryFunction (long limit)
{
    BigInteger sum = BigInteger.ZERO;
    int [] m = MoebiusSieve(limit);
    for (int i=1;i<m.length;i++)
        sum=sum.add(BigInteger.valueOf((long) (m[i]*Math.floor(limit/i)*Math.floor(limit/i))));
    return sum.add(BigInteger.ONE).divide(BigInteger.ONE.add(BigInteger.ONE));
}

Where MoebiusSieve is a function that computes the Moebius function values up to a certain limit in a sieve, using an eratosthenes-like method.

After understanding and implementing the recursive method suggested in a link provided in the comments: $$P(n) = \frac {n(n+1)}{2} - \sum_{i=2}^\sqrt n P(\lfloor \frac ni \rfloor) - \sum_{j=1}^\sqrt n P(j) \cdot (\lfloor \frac nj \rfloor - \lfloor \frac n{j+1} \rfloor)$$

I can compute values up to $P(10^{11})$, and with maximum memory allocation, pre-computing as many $\varphi(n)$ as possible and consequently all $P(n)$ that I can for memoization, I can compute $P(10^{12})$ in just over 20 minutes. A major improvement but still a little far from $P(10^{16})$. It's ok if the computation takes a bit longer, but I fear $P(10^{16})$ would take exponentially longer time, judging by the "jump" in computation time between $P(10^{11})$ and $P(10^{12})$. My memory allows me to "save" up to $350,000,000 \space φ(n)$ values OR up to $700,000,000 \space μ(k)$ values. Perhaps there is a way to perform the summation using μ(k) values rather than φ(n)?.

All my computations suggest and show that my recursion is the prominent time consumer. This is obvious, but I am sure it takes longer than it should, as pointed out by qwr. So I am posting below the recursion code, with some documentation. It seems to me that this is the right way to do this computation, but my implementation is not optimal.

    public static BigInteger phiR (long limit, long [] s) // limit is 10^t, s is the sieve of precomputed values of `P(n)`. Can store maximum 350,000,000 values
{                                                                                                                                               
    if (limit<s.length)                                 
        return BigInteger.valueOf(s[(int) limit]);
    BigInteger sum = BigInteger.valueOf(limit).multiply(BigInteger.valueOf(limit).add(BigInteger.ONE)).divide(BigInteger.valueOf(2)); // this corresponds to the n'th triangular number
    BigInteger midsum1=BigInteger.ZERO;
    BigInteger midsum2=BigInteger.ZERO;
    for (long m=2;m*m<=limit;m++) // computing the first sum, first for changing floor(limit/m) values
        midsum1=midsum1.add(phiR((long) Math.floor(limit/m),s));
    for (long d=1;d*d<=limit;d++) // computing the second sum
        if ((double)d!=Math.floor(limit/d))
            midsum2=midsum2.add(BigInteger.valueOf((long) (Math.floor(limit/d)-Math.floor(limit/(d+1)))).multiply(phiR(d,s)));
    sum=sum.subtract(midsum1).subtract(midsum2);
    return sum;
}

I was suggested to use dictinaries by qwr, in addition to the array, for big values of $n$, but I don't know anything about it. Can another improvement be made to make the time frame a day or so?

Please don't use $\phi$ for this. Many authors use it for the totient function itself. — darij grinberg, Apr 29 '19 at 19:13
Well, moebius is $0$ in a lot of cases. What if you extract the primes which should take you squareroot and generate the cases that the $\mu$ is not $0$ — Phicar, Apr 29 '19 at 19:14
Actually: https://mathproblems123.wordpress.com/2018/05/10/sum-of-the-euler-totient-function/ — darij grinberg, Apr 29 '19 at 19:19
@Phicar I believe it can improve computations for small ranges, but it would still not get me far enough. I would still need the $\mu$ for the bigger numbers. Also, I fear generating primes up to that range is impossible — MC From Scratch, Apr 29 '19 at 19:20
@darijgrinberg Following the ideas provided in the link, I can get values up to $P(10^{12})$, as updated in the post. Can any additional improvement be made to get to $P(10^{16})$ in reasonable time, say a day or so? — MC From Scratch, Apr 29 '19 at 20:20
You may get better code optimization help asking on stackoverflow or codereview stackexchange. That being said, I don't know why you have any while loops at all. If you look at the formula you and I give, there are only two sums which are computed with for loops. Implement the formula directly. — qwr, Apr 30 '19 at 03:45
@qwr The 'while' loops just determines how many times the values of the floors of $n/j$ constantly change inbetween $j$'s. Then for values that are constant on varying ranges, checks how many times each appear and add the value of $P(n)$ accordingly, multiplied by that factor. It should save some time but even replacing this part with a straightforward 'for' loop doesn't improve the time whatsoever. — MC From Scratch, Apr 30 '19 at 05:03
I don't know what you mean by "how many times the values of the floors change". Read my linked answer carefully and make sure you know what the floor functions are doing. I am not asking you to replace the while loops with for loops; I am asking you why you have while loops at all. — qwr, Apr 30 '19 at 06:05
@qwr I re-edited the code to remove the while loop, it now appears in the question. It still takes the same amount of time. Even by using all possible memory that I can by precomputing all $P(n)$ values up to $350,000,000$, $P(10^{12})$ takes 20 minutes :/ — MC From Scratch, Apr 30 '19 at 14:05
You'll certainly get a speedup if you memoize all values of $P(n)$. Again I stated so in my linked answer. — qwr, Apr 30 '19 at 17:09
I don't understand "My memory allows me to "save" up to $350,000,000 \varphi(n)$ values OR* up to $700,000,000 \mu(k)$ values*". $\mu(k)$ can only take three different values, so it requires less than 2 bits of storage space. — Peter Taylor, Jun 20 '19 at 10:23
I managed to compute it just yesterday, @qwr ! Your answer was key in doing it. Thank you! — MC From Scratch, Jun 20 '19 at 13:51
This site is structured around questions and answers. It gets harder to read when that basic mental model is broken. So if you later find the answer to your own question, the best thing to do is to post it as an answer, not to edit the question. I suggest you remove the update, put it below in an answer, maybe explain the derivation, and click the tick mark to select it as the accepted answer. As a bonus for you, this would allow people to upvote both the question and the answer, giving you 15 rep rather than 5. — Peter Taylor, Jun 22 '19 at 12:53

qwr · Answer 1 · 2019-04-29T21:55:20.023

3

The recursive solution you showed takes $O(n^{2/3})$ time (see my answer). To the best of my knowledge this is as fast as you can get for exact answers using recursive approaches. My implementation in PyPy computes $P(10^{12})$ in 22 seconds (extrapolating $P(10^{16})$ will take 1-2 days with enough memory), so you'll need to work on the efficiency of your implementation.

My algorithm is almost a straightforward translation from my linked answer. Currently I have a cutoff set at $n^{2/3}$. For all $n$ less than this cutoff, the totients/totient sums are pre-calculated using a Sieve of Eratosthenes approach. For values of $n$ larger than the cutoff, compute recursively and store in a dictionary. For $P(10^{16})$ you will probably run out of memory trying to store all totient values and instead have to load and store from disk.

You can scour the references in OEIS A002088 for resources which I often find useful.

On OEIS, only terms up to $P(10^{18})$ are known (A064018). The highest values were computed by Hiroaki Yamanouchi (cursory research indicates he may be the legendary Min_25 on Project Euler, SPOJ, CodeChef, etc.!) on OEIS; you may try contacting him to see how he did it.

edited Apr 29 '19 at 21:55

answered Apr 29 '19 at 20:59

qwr

11,362

This is rather impressive. My $P(10^{11})$ gives the following timings: creating a sieve of $\varphi(k)$ for $1 \leqslant k \leqslant 350,000,000$ (my max memory) takes 9.7s, consequent summation for values of $P(n)$ in range takes 0.4s, and the rest, which is just the recursion takes about 32s. But I am not sure what else could be improved in the implementation of the recursion. It's rather straightforward. Could you please post part of your algo? Also, I edited the post again. – MC From Scratch Apr 29 '19 at 21:27
@MCFromScratch I've described my algorithm. It is straightforward. What language are you using? Java? – qwr Apr 29 '19 at 21:50
I am using Java. It isn't the best but it should still be able to perform such calculations quite effectively. My cutoff is even larger than yours, but I can't have more than 350,000,000 values for $\varphi$ stored. What does it mean to store in a dictionary? – MC From Scratch Apr 29 '19 at 21:53
Python dictionarys are associative arrays https://en.wikipedia.org/wiki/Associative_array (really just look for Java's hash table or hash map). This saves space for large n because we don't need most values for large n (also you can use an array with special index). For 10^16 you are storing about sqrt(n) totient values. This will make things slower but I don't know how to speed up operation without storing a lot of values on disk or something. – qwr Apr 29 '19 at 21:59
I don't know if it's actually necessary to store an entire range of totient values but I have only ever used this for at most $P(10^{13})$. With clever storage you can probably bring down the space requirement, though by how much you will have to experiment. – qwr Apr 29 '19 at 22:02
I've made a small optimization, realising the floors become constant for some ranges. It didn't seem to improve the computation time whatsoever. I don't understand how the dictionaries might help, because in the end they don't allow me to have more values stored. Obviously I don't understand something here. Nonetheless, I've included my recursion code with documentation. I would really appreciate it if it could be shown how it may be improved. – MC From Scratch Apr 29 '19 at 22:46

score 1 · Answer 2 · answered Jun 20 '19 at 11:15

1

Asymptotically you may be able to do better using the approach described in Lagarias and Odlyzko, Computing $\pi(x)$: an analytic method (1987) J. Algorithms vol.8 pp.173-191. In section 2 they discuss conditions which suffice to adapt the approach to functions other than $\pi(x)$, and $\Phi(x)$ would seem to meet them. However, it should be noted that although asymptotically their algorithm is $O(x^{1/2 + \varepsilon})$, the hidden constant may be rather higher than the algorithm you're currently using.

answered Jun 20 '19 at 11:15

Peter Taylor

13,701

Here's a free-to-view article https://arxiv.org/abs/1203.5712 that describes the implementation. According to the author, the benefit of the analytical algorithm is that it is trivially parallelizable. But "the crossover at which this implementation of the analytic algorithm would start to beat the combinatorial method would be in the region of $4 \times 10^{31}$." – qwr Jun 20 '19 at 13:34

Summing the totient function $\sum_{k=1}^n \varphi(k)$

2 Answers2

Linked