I want to calculate ($m^{e} \pmod{N}$) in C or C++. I want to use 2048 bits long modulus $N$ and thus the above process of exponentiation will use a huge amount of calculations and time. I want to know the possible optimizations I can use.
I particularly came across this :
Converting the exponent to its binary bit pattern and then looping from MSB to LSB, calculating the square of the result at each step mod N and if the present bit of exponent is 1, multiplying the result by m.
Also, I can compute the square of the result using the Karatsuba algorithm.
What other optimizations, if possible, exists in this regard?
Is there any already implemented and optimized library for C or C++ for such type of modular arithmetic?