Which one is fastest? Karatsuba or Montgomery multiplication?

Question

Is there any complexity analysis between Karatsuba and Montgomery multiplication algorithms? It seems that Karatsuba is more general in the sense that is not modulo tuned while Montgomery it is. Does a also a hybrid model using Karasuba and Montogomery exists?

fgrieu · Accepted Answer · 2015-11-26T14:39:28.777

Summary: Montgomery only aims at modest (if any) speedup compared to classic algorithms; it is popular for other reasons. Karatsuba allows large speedups for very large parameters, but the threshold where it becomes beneficial is often not reached in cryptographic applications. The techniques can be used together.

Montgomery arithmetic is used for modular multiplication and exponentiation, a common operation in cryptography (RSA, some Elliptic Curve Cryptography systems). At the cost of some pre- and post-computation (of mostly negligible cost when the exponent is big enough to be secret), it simplifies the modular reduction steps, specifically with two benefits:

It avoids possible mis-estimation of the quotient, which in classical algorithms leads to a special case; this is beneficial from the standpoint of side channel leakage (e.g. timing attacks); and by the peace of mind it gives (the classical quotient estimation and its special case are hard to get right and test fully).
The Montgomery equivalent of quotient estimation is performed based on the low-order bits of the value to be reduced (rather than high-order), and that eases implementation of multiplication and modular reduction interleaved in the same scan of a temporary result (that interleaving technique in turn limits the width of numbers manipulated to about the size of the modulus, and reduces the number of memory accesses, compared to naively computing a full product then reducing it).

However Montgomery arithmetic leaves the cost mostly unchanged compared to a comparably good implementation using classical algorithms, accounting for both elementary multiplications and memory accesses. Modular exponentiation with $n$-bit numbers including exponent remains of cost $\mathcal O(n^3)$. More precisely, for both Montgomery and classical algorithms using $w$-bit words, interleaved multiplication and reduction, and basic scanning of random exponent: $\approx{3\over w^2}n^3$ multiply-and-accumulate with double-width result, and $\approx{6\over w^2}n^3$ memory accesses (${3\over4}$ reads, ${1\over4}$ writes).

Karatsuba multiplication is a divide-and-conquer algorithm for (non-modular) multiplication, which for $n$-bit integers reduces cost from the $\mathcal O(n^2)$ for classical multiplication to $\mathcal O(n^{\log_2 3})$, that is $\mathcal O(n^{1.58\dots})$.

Applied to modular exponentiation with $n$-bit numbers including exponent, the cost goes from $\mathcal O(n^3)$ to $\mathcal O(n^{1+\log_2 3})$, that is $\mathcal O(n^{2.58\dots})$. One of several methods for getting the benefits of Karatsuba multiplication during modular reduction is pre-computing the (non-modular) inverse of the modulus to slightly more than $n$ bits, which can be done at cost $\mathcal O(n^2)$ (thus negligible as far as $\mathcal O$ is concerned) with classical algorithms.

Karatsuba multiplication is beneficial only past some threshold for $n$. That threshold vary considerably depending on an awful lot of things. In a hardware multiplier optimized for low power, Karatsuba pays for modest $n$. In the GNU MP Bignum Library, there used to be a default KARATSUBA_THRESHOLD as high as 32 for non-modular multiplication (that is, Karatsuba was used when $n\ge32w$ with typically $w=32$); the optimal threshold for modular exponentiation tending to be significantly higher. On modern CPUs, Karatsuba in software tends to be non-beneficial for things like ECDSA over P-256 ($n=256$, $w=32$ or $w=64$), but conceivably useful for much wider modulus as used in RSA.

Karatsuba multiplication can be used together with Montgomery reduction. A good way to do so is by using big segments of arguments upon which Karatsuba multiplication is used. That could be the case for example in an implementation using Montgomery in the overall algorithm, with wide words and a wide multiplier (possibly hardware) using Karatsuba. Sometime Karatsuba is used for multiplication, followed by a separate reduction step using Montgomery; in which case the overall savings allowed by Karatsuba is less than $2$, irrespective of $n$.

I suggest Modern Computer Arithmetic for more details; or the classic but still useful Handbook of Applied Cryptography, especially chapter 14.

Which one is fastest? Karatsuba or Montgomery multiplication?

1 Answers1