Where to apply Montgomery Multiplication in $GF(2^n)$?
This answer really depends on how you constructed the binary extension field $GF(2^n)$. If the irreducible polynomial is trinomial or pentanomial then the reduction is already efficient!
Modular Multiplication with trinomials
Let $GF(2^n)$ is constructed with the trinomial $P(x)=x^m+x^n+1$ where $\alpha$ is the root of this polynomial with $1<n<m/2$.
Now you want to calcualte $C'(x) = C(x) \bmod P(x) $ where
$$C(x) = A(x)\cdot B(x) = \left( \sum_{i=0}^{m-1} a_i x^i \right)\
\cdot \left( \sum_{i=0}^{m-1} b_i x^i \right)$$
To reduce the multiplication cost use Karatsuba-Ofman multiplier. This may require some tuning since it has recursive nature and diving to the end may not be the best choice.
Now, once you had a $$C(x) = \sum_{i=0}^{2m-2} c_i x^i$$ you need reduction. We can use the $P(\alpha)=0$, and write
$$\begin{align}
\alpha^m &= 1 + \alpha^n\\
\alpha^{m+1} &= \alpha + \alpha^{n+1}\\
\vdots & \quad \vdots\\
\alpha^{2m-3} &= \alpha^{m-3} + \alpha^{m+n-3}\\
\alpha^{2m-2} &= \alpha^{m-2} + \alpha^{m+n-2}\\
\end{align}$$
By using this we can say
$$\begin{align}
c'_0 &= c_0 + c_m\\
c'_1 &= c_1 + c_{m+1}\\
\vdots \;\;& \quad \vdots\\
c'_{n-1} &= c_{n-1} + c_{m+n-1} + c_{m+n-1}\\
c'_{n} &= c_{n} + c_{m+n} + c_{m+n}\\
c'_{n+1} &= c_{n+1} + c_{m+n+1} +c_{m+n+1}\\
\vdots \;\;& \quad \vdots\\
c'_{m-2} &= c_{m-2} + c_{2m-2} + c_{2m-n-2}\\
c'_{m-1} &= c_{m-1} + c_{2m-n-1}\\
c'_{m} &= c_{m-2} + c_{2m-n}\\
\vdots \;\;& \quad \vdots\\
c'_{m+n-3} &= c_{m+n-3} + c_{2m-3}\\
c'_{m+n-2} &= c_{m+n-2} + c_{2m-2}\\
\end{align}$$
If one look at the above one should notice those;
- The reduction is not complete since there are still $m-n$ to reduce.
- The operations are just x-or and mapping.
- The mapping is actually a loop
The cost of multiplication is just $m^2$ and roughly at most $2m$ additional x-or for the reduction!
When we turned back to Montgomery modular multiplication is has cost $2m^2$ multiplications and $2m^2-3m-1$ x-or operations.
Modular Multiplication with arbitrary irreducible polynomial
When you have an arbitrary irreducible polynomial for the binary extension field, the above table is no longer effective. Then you can turn back to Montgomery.
Note that the Montgomery Modular multiplication can also use the Karatsuba-Ofman to reduce the cost of multiplication that we did not take into account here!.
Some notes:
See how How does Montgomery reduction work? for the details of Montgomery (Integer case, almost same as polynomials)
If you need one modular multiplication then you don't need to turn both of them into their Montgomery residue representation. You can achieve by; $$MonPro(a(x),MontPro(b(x),1))$$ You can also use Montgomery in modular square-and-multiply.
Usually we select the irreducible polynomial during the design. There are already a Table of Low-Weight Binary Irreducible Polynomials by HP and methods Finding irreducible polynomials over GF(2) with the fewest terms
Inversion with power and modulus is not an effective method. There is already Itoh-Tsujii algorithm that uses $t \ll m$ multiplications and $m-1$
squaring. This algorithm is already beating the Extended-GCD.