8

The Schönhage–Strassen multiplication algorithm works by turning multiplications of size $N$ into many multiplications of size $lg(N)$ with a number-theoretic transform, and recursing. At least I think that's what it does because there's some other cleverness and I really don't understand it well enough to summarize accurately. It finishes in $O(N \cdot lg(N) \cdot lg(lg(N)))$ time.

A number-theoretic transform is exactly like a Discrete Fourier Transform, except it's done in the finite field $F_{2^N+1}$ of integers modulo $2^N+1$. This makes the operations a lot cheaper, since the Fourier transform has a lot of multiplying by roots of unity, and $F_{2^N+1}$'s roots of unity are all powers of 2 so we can just shift! Also, integers are a lot easier to work with than floating point complex numbers.

Anyways, the thing that confuses me is that $F_{2^N+1}$ is very large. If I give you a random element from $F_{2^N+1}$, it takes $O(N)$ bits to specify it. So adding two elements should take $O(N)$ time. And the DFT does a lot of adding.

Schönhage–Strassen splits the input into $\frac{N}{lg(N)}$ groups with $lg(N)$ bits. These groups are the values of $F_{2^N+1}$ that it will transform. Each pass of the DFT will have $O(\frac{N}{lg(N)})$ additions/subtractions, and there are $O(lg(\frac{N}{lg(N)}))$ passes. So based on addition taking $O(N)$ time it seems like the cost of all those additions should be $O(N \frac{N}{lg(N)} lg(\frac{N}{lg(N)}))$, which is asymptotically the same as $O(N^2)$.

We can do a bit better than that... because the values start out so small, the additions are quite sparse. The first pass' additions really only cost $O(lg(N))$ each, and the second pass' cost $2^1 O(lg(N))$ each, and the i'th pass' cost $O(min(N, 2^i \cdot lg(N)))$ each, but that still all totals out to a terrible $\frac{N^2}{lg(N)}$.

How is Schönhage–Strassen making the additions cheap? How much do they cost overall?

Is it because the algorithm actually uses $F_{N+1}$ (with $N$ guaranteed to be a power of 2)? There's enough stacked $2^{2^{n}}$ and german in the paper that I'm really not sure. On the other hand, I don't think that guarantees enough roots of unity for things to work out.

Craig Gidney
  • 5,992
  • 1
  • 26
  • 51

2 Answers2

5

According to the Wikipedia article, at each step the length of the integers is reduced from $N$ to (roughly) $\sqrt{N}$, and there are (roughly) $\sqrt{N}$ of them, and so the additions only cost $O(N)$. There is a detailed analysis of the running time in the final paragraph of the linked section, copied here in case it changes:

In the recursive step, the observation is used that:

  • Each element of the input vectors has at most $N/2^k$ bits;
  • The product of any two input vector elements has at most $2N/2^k$ bits;
  • Each element of the convolution is the sum of at most $2^k$ such products, and so cannot exceed $2N/2^k + k$ bits.

Here $N$ is current input length and $k = \Theta(\sqrt{N})$. Arithmetic is done modulo $2^n+1$, where $n$ is some multiple of $2^k$ which is larger than $2N/2^k + k$; note that $n = \Theta(\sqrt{N})$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
3

There's an excellent explanation of exactly what's going on, including how the size of the field goes down as you progress, in GMP's documentation: 15.1.6 FFT Multiplication.

A more theoretically in-depth overview starts on page 55 of Modern Computer Arithmetic (pdf) (well... the browser says page 71 but the page itself says 55). It explains the algorithm's correctness and breaks down the complexity analysis.

Craig Gidney
  • 5,992
  • 1
  • 26
  • 51