Fast exponentiation algorithm - How to arrive at it?

Question

I've been learning about fast exponentiation when I found this algorithm:

int expo(int a, int b) {
  int result = 1;

  while (b) {
    if (b%2==1) {
      result *= a;
    }
    b /= 2;
    a *= a;
  }

  return result;
}

This is supposed to compute $a^b$ in logarithmic time, but I couldn't understand or find anywhere a way to arrive at this procedure, other than by noting that (this particular formula didn't help me in understanding why the algorithm is correct):

$$ x^n = \begin{cases} 1 & \text{if $n=0$} \\ \frac{1}{x}^{-n} & \text{if $n<0$} \\ x \cdot \left( x^{\frac{n-1}{2}} \right)^2 & \text{if $n$ is odd} \\ \left( x^{\frac{n}{2}} \right)^2 & \text{if $n$ is even} \end{cases} $$

From what I understand, the transition from this particular identity to the actual algorithm is quite obvious, but I honestly don't get it and I've worked by hand quite a few examples for this algorithm.

Can anyone explain?

Run the loop by hand for, say, $b = 5$. Hint: $a^5 = a^{4+1} = (a^2)^2 \cdot a$. Note that b%2 == 1 when the $n^{th}$ bit is set in the binary representation of $b$. — dxiv, Aug 20 '16 at 16:46
Does that amount to anything more other than the fact that $b$ is odd in that case? — jayno, Aug 20 '16 at 16:54
Have you read the wikipedia article on this algorithm? https://en.wikipedia.org/wiki/Exponentiation_by_squaring — BlueRaja - Danny Pflughoeft, Aug 20 '16 at 20:46
It would be very helpful if you could edit your answer to include where you stumbled across this snippet — polynomial_donut, Nov 24 '19 at 16:48
The identity in the question relates $x^n$ to lower exponents ($n/2$ etc) and forms the basis of Left-to-Right Binary Exponentiation, which processes bits of the exponent ($n$) MSB to LSB. But the algorithm posted is Right-to-Left Binary algorithm (and the relevant identities already available in answers). — Nitin Verma, Aug 06 '21 at 08:58

score 26 · Accepted Answer · answered Aug 20 '16 at 17:08

26

Write $b = b_0 + b_1 2 + b_2 2^2 + ... + b_n 2^n$ where $b_k$ are the binary digits of the representation of $b$ in base $2$. Note that $n$ varies logarithmically with $b$. Then:

$$ a^b = a^{b_0} \cdot (a^2)^{b_1} \cdot (a^{2^2})^{b_2} \cdot ... \cdot (a^{2^n})^{b_n} $$

The terms where bits $b_k = 0$ evaluate to $1$ and can be skipped. The terms with $b_k = 1$ are of the form $a^{2^k}$ and can be calculated by repeatedly squaring $a$ $k$ times. This is precisely what the posted code does.

answered Aug 20 '16 at 17:08

dxiv

77,867

2

Thank you, that equation made perfect sense right away ! – jayno Aug 20 '16 at 17:09
1

For mathematically interested people: This argument shows that the algorithm also works if a is an element of some arbitrary ring. – polynomial_donut May 19 '19 at 18:17
I can't figure out what you're saying. – Timothy Aug 28 '20 at 19:20
@Timothy Which part? – dxiv Aug 28 '20 at 20:56
@dxiv I don't know. I actually couldn't understand the question itself. Maybe that's why I can't understand the answer. I skimmed it. I didn't want to bother. Maye it had Python code which I have trouble figuring out how to interpret. – Timothy Aug 29 '20 at 00:27

score 6 · Answer 2 · answered Aug 20 '16 at 17:12

6

The algorithm uses the binary expansion of the exponent to reduce the number of multiplications one has to do. If you take $a$ and square it and then square it again and then square it again, you produce the numbers $a, a^2, a^4, a^8,\ldots,a^{2^k}$ until $2^{k+1}>b$ is So that takes about $\log_2 b$ multiplications. Then if you express $b$ in binary, say, $b=1101001$, then $a^b = a^{2^6+2^5+2^3+2^0} = a^{2^6}a^{2^5}a^{2^3}a^{2^0}$, and you've just computed all these powers of $a$, so multiply them together and that's about $\log_2 b$ more multiplications.

answered Aug 20 '16 at 17:12

B. Goddard

33,728

Thank you very much. Could you please explain which part of the code does pick binary bits where they are found to be 1 and then evaluate $a^{2^k}$. This is the part that still confuses me, but the overall approach is clear to me. – Avv Jun 03 '21 at 14:21
@Avra The code isn't really picking out the binary bits. It's using the method of converting to binary here: https://indepth.dev/posts/1019/the-simple-math-behind-decimal-binary-conversion-algorithms I think the best way is to work through a couple examples by hand. – B. Goddard Jun 03 '21 at 14:36

Bernard · Answer 3 · 2016-08-21T01:01:16.120

2

I hardly understand you code, but here is description of the algorithm in pseudo-code: we consider the successive values of the squares, denoted by $S$, and by $P$ the successive values of the powers of the number $a$:

Input: $\;a, n$

Output: $\;a^n$

$S\leftarrow a$, $P\leftarrow a$ ;

While $n>1$ do

$\qquad$If odd($n$) $\;P\leftarrow P*S\enspace$ fi \begin{align*} &n\leftarrow \Bigl\lfloor n/2\Bigr\rfloor&\hspace25em\\ &S\leftarrow S^2 \end{align*}

Endwhile

$ P\leftarrow P*S$.

End

edited Aug 21 '16 at 01:01

answered Aug 20 '16 at 17:10

Bernard

179,256

do you find the code poorly formatted ? I'm sorry if that's the case, I will make appropriate edits. – jayno Aug 20 '16 at 17:11
That is not the point: I only a mathematician, not a computer scientist,and I don't know the codes. – Bernard Aug 20 '16 at 17:14
Somewhere in the translation from code to pseudocode some bugs crept in: when instantiated e.g. with n=2, a^4 is produced instead of a^2. – Jan Stout Aug 21 '16 at 00:39
@Jan Stout: You're right. I wrongly copied the order of the instructions in the loop. It's fixed now. Thanks for pointing it! – Bernard Aug 21 '16 at 01:04

score 1 · Answer 4 · answered Aug 20 '16 at 16:57

If you want to do something in logarithmic time, you probably need to do something with the digits of the number in some base, rather than with the number itself. The digital representation of a number is inherently a logarithmy object. Additionally, since $b$ is the main thing that determines how big $a^b$ is, we're probably going to need to think of $b$ in a logarithmy way, or we're not going to have any hope of making our algorithm go logarithmic.

If you now consider $b$ in base $2$, this is the natural algorithm that falls out.

One may also be inspired by the binary search algorithm, which is to find an element in a sorted list. It works by dividing the list into two, determining whether the element is in the first (resp. second) half of the list, and repeating the algorithm using the now-half-sized list which is the first (resp. second) half of the original. This "divide and conquer" idea turns up all over the place if we want to do things in logarithmic time; the algorithm you have given is a natural answer to the question "can we divide-and-conquer this?".

Maybe this should be a comment? I don't see how it solves or suggests an approach to the problem at hand.. — jayno, Aug 20 '16 at 17:06
@jayno I consider it to be a full answer to "This is supposed to compute $a^b$ in logarithmic time, but I couldn't understand or find anywhere a way to arrive at this procedure, other than by noting that…". What in particular is the problem you have, if not that, or is my explanation failing somehow? — Patrick Stevens, Aug 20 '16 at 17:07
It seems that the problem is in my original post, sorry. I will edit. — jayno, Aug 20 '16 at 17:14

HopefullyHelpful · Answer 5 · 2016-08-20T20:25:53.833

Try to exponantiate by only adding exponents, do so in the least amount of steps. Then think about how to get an actual exponent, to actually do the multiplication. The way it's done in the algorithm ensures that you can recycle every variable in the algorithm.

Lets define our function

slowExpo(x,y)

steps:

Generate a sequence of additions, that sum up to y. $y_i$
Generate a sequence of $x^{y_i}$.
Multiplicing every member of the sequence from step 2.
Return the ouput of step 3, it's x^y.

This works because of the exponent addition laws.

$a^b * a^c = a^{b+c}$

The sequence generated in your algorithm is designed for binary and to recycle every sequence member generated, and does step 1,2,3 together "in pseudo-parallel" in one loop, this reduces the amount of work, because the actual lists of sequences don't have to be stored/retrieved and the exponentiated version of $x^{y_i}$, don't have to be calculated via recursion as slowExpo(x,y_i).

mathreadler · Answer 6 · 2016-08-20T21:20:33.877

-1

Some comments on performance which sadly were too long for a comment.

You can do bit shifts instead of division :
```
b >>= 1; // same effect as b/=2, but uses logic instead of expensive integer div instr.
```
shift all bits left one step. You may need to make sure integers are positive (unsigned int) but will def be faster than risk issuing integer "div" instructions.
If your hardware has a "find first set bit" instruction (for some reason called ffs), you can use that to get deterministic bound for your loop which will remove branchery at cost of like one pipelineable cpu cycle.
Avoid ifs in loops, may well be better to do
```
result *= a*(b&0x1);
```
replacing the if with logic and arithmetics. Multiplications are very pipelineable and branches risk breaking pipelines. Help the compiler to make the instructions as predictable as possible.

edited Aug 20 '16 at 21:20

answered Aug 20 '16 at 20:38

mathreadler

26,534

2

This does not provide an answer to the question. To critique or request clarification from an author, leave a comment below their post. - From Review – Henrik supports the community Aug 20 '16 at 21:19
Yes I clarified that at start, that it was too long for a comment. – mathreadler Aug 20 '16 at 21:20
You could write two comments. – quid Aug 20 '16 at 21:47
3

It being too long for a comment doesn't make it acceptable to place in an answer instead. It's not a forum for discussion of implementation or general discussion, it's a direct Q&A. – Nij Aug 20 '16 at 23:07
Thank you very much. – Avv Jun 03 '21 at 14:18

Fast exponentiation algorithm - How to arrive at it?

6 Answers6

Linked