2

The reason for asking, is that this occurs in real life with CRT calculation of RSA decryption/signing. In CRT RSA, there's the need to calculate subtraction, and it's known negative numbers could occur, especially with multi-prime cases.

While RSA would eventually phase out when moderately-sized quantum computers are available, before QCs can reach that scale, scaling RSA parameters (especially towards using more primes) may be more feasible than rushing to adopt post-quantum schemes in the short term, therefore, it's still a bit relevant.

While it's easy to to do modular arithmetic in variable-time, it seems very difficult do it in constant-time and with limited space for holding working variables.

Q1: Is there a way to implement efficient constant-time modular arithmetic for signed two's-comlement integers in constant time?

Additional Q2 for refuting the use of such implementation: Is there alternative methods for computing CRT RSA decryption/signing that uses no more (or even less) number of working variables? (these "working variables" are big integers).

DannyNiu
  • 10,640
  • 2
  • 27
  • 64

2 Answers2

3

Using that $\forall x\in\mathbb Z$, it holds $x\bmod p\,=\,p-1-((-x-1)\bmod p)$, and that in so-called two's-complement arithmetic $-x-1$ is the bitwise complement of $x$, the other answer's code simplifies to:

uint32_t imod(uint32_t x, uint32_t p) {
    uint32_t s = -(x>>31);          // 0 or -1, with the later iff x<0
    x = x ^ s;                      // conditionally complement x, so that x >= 0
    x = x % p;                      // unsigned modular reduction in constant time
    x = x ^ (((p-1 - x) ^ x) & s);  // conditionally change x to p-1 - x
    return x;
}

This also has the advantage that the code computing x % p is now assured that $x<2^{31}$ on input, instead of having to properly handle $x=2^{31}$ when imod is passed x = 0x80000000 in order to compute $(-2^{31})\bmod p$.


In cryptography, an alternative to signed arithmetic is to avoid negative values entirely. For example, in RSA-CRT with $p>q$, the private-key computation $x\gets y^d\bmod(p\,q)$ can go (with the first three steps performed just once at key generation) $$\begin{align} d_p&\gets d\bmod(p-1)\\ d_q&\gets d\bmod(q-1)\\ q_\text{inv}&\gets q^{-1}\bmod p\\ x&\gets(y\bmod q)^{d_q}\bmod q\\ x&\gets x+q\cdot((p-x+((y\bmod p)^{d_p}\bmod p))\cdot q_\text{inv}\bmod p) \end{align}$$ which involves no negative value, since $p>q\implies p-x>1$ before the last expression. If we don't know that $p>q$ but know that $p$ and $q$ have the same bit size (as is the case for a FIPS 186‑4 key), we can change $p-x$ to $2p-x$.

In multiprime RSA, this needs adaptation. The corresponding part of the calculation (detailed there) involves computing $$x\gets x+m\cdot(((x_i-x)\bmod r_i)\cdot t_i\bmod r_i)$$ where $x_i\ll x$ (typically). Using that $0\le x_i<r_i$, we can rearrange this to either one of $$\begin{align} x&\gets x+m\cdot((r_i-(x\bmod r_i)+x_i)\cdot t_i\bmod r_i)\\ x&\gets x+m\cdot(((r_i-(x\bmod r_i)+x_i)\bmod r_i)\cdot t_i\bmod r_i) \end{align}$$ which involves no negative value, and is asymptotically as efficient as the original.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
2

This answer answers Q1, as there may be merit other than just for calculating CRT RSA (e.g. the extended Euclidean algorithm).


Let's build it step-by-step.

First let's assume you've implemented the % operator for unsigned modular arithmetic in constant time, when you encounter a negative number x, we have the following:

x === -(-x)

Introducing the modulus p, and we have:

x (mod p) === p - (-x) (mod p)

Hence, the algorithm for negative numbers (in pseudo-code, not constant-time yet):

imod(x):
    x = -x
    x = x % p
    if x > 0: x = p - x
    return x

Now, we want to generalize it to apply to also positive numbers, here's the full code in C (constant-time this time):

uint32_t imod(uint32_t x, uint32_t p)
{
    uint32_t neg = -(x >> 31);
    uint32_t z;
// conditionally negate x.
x ^= neg;
x += neg &amp; 1;
x = x % p;

// test for 0.
z = x;

// Per suggestion by @fgrieu
// The following code fragment involves:
// 2 shifts, 4 bit-ops, and 2 additive ops.
z |= z &gt;&gt; 16;
z &amp;= 0xffffU;
z = -(1 ^ ((z ^ (z - 1)) &gt;&gt; 31));

// In comparison, the following requires:
// 5 shifts, 6 bit-ops, and 1 additive ops.
/* z |= z &gt;&gt; 16;
 * z |= z &gt;&gt; 8;
 * z |= z &gt;&gt; 4;
 * z |= z &gt;&gt; 2;
 * z |= z &gt;&gt; 1;
 * z = -(z &amp; 1);
 */

neg &amp;= z;

// conditionally compute p - x.
x = ((~neg &amp; x) | (neg &amp; p)) - (neg &amp; x);
return x;

}

DannyNiu
  • 10,640
  • 2
  • 27
  • 64