Not an answer but too long for a comment.
An analytical expression of the $L^2$ norm seems very difficult (not to say impossible).
What I would suggest is to built the $[2,1]$ Padé approximant of $\tanh(x)$ around $a=\frac{x_0+x_1}2$. This would give
$$\tanh(x)=\frac{3 \tanh (a)+2 (x-a)+ (\tanh (a)-\coth (a))(x-a)^2}{3+ (3 \tanh (a)-\coth (a))(x-a)}+O\left((x-a)^4\right)$$
Trying for $x_0=1$ and $x_1=2$
$$\int_1^2 \Big[\tanh(x)- \text{Padé}\Big]^2\,dx=3.90537\times10^{-7}$$ seems to be already decent. For the same problem, a full optimization would give $1.15644\times10^{-8}$
$$\int_2^3 \Big[\tanh(x)- \text{Padé}\Big]^2\,dx=8.80730\times10^{-9}$$
$$\int_3^4 \Big[\tanh(x)- \text{Padé}\Big]^2\,dx=1.66518\times10^{-10}$$
$$\log\Bigg[\int_n^{n+1} \Big[\tanh(x)- \text{Padé}\Big]^2\,dx\Bigg]=-3.87288\, n-11.2126$$
May be, you could use the sum of these Padé approximants each of them being built for a short interval.
Edit
The key problem is that we cannot have even an explicit solution for
$$\int \left(\tanh (x)-\frac{a x}{b x+1}\right)^2 \, dx$$ There is one thing you could try taking into account the fact that the $[2,1]$ Padé approximant is $O\left((x-a)^4\right)$.
Let
$$\tanh(x)=\sum_{n=0}^3 b_n \,(x-a)^n$$ and define the $L^2$ norm
$$\Phi(b_0,b_1,b_2,b_3)=\int_{x_0}^{x_1} \Bigg[\tanh(x)-\sum_{n=0}^3 b_n \,(x-a)^n\Bigg]^2\,dx$$ developing the integrand, we have terms in $x^k$ $(k=0,\cdots,6)$ (no problem with these) and a series of integrals
$$I_n=\int x^n \, \tanh(x)\,dx \qquad \qquad k=0,1,2,3$$
They do not make much problems
$$I_0=\log (\cosh (x))$$
$$I_1=\frac{1}{2} \left(x \left(x+2 \log \left(e^{-2
x}+1\right)\right)-\text{Li}_2\left(-e^{-2 x}\right)\right)$$
$$I_2=-x \text{Li}_2\left(-e^{-2 x}\right)-\frac{1}{2} \text{Li}_3\left(-e^{-2 x}\right)+\frac{x^3}{3}+x^2 \log \left(e^{-2 x}+1\right)$$
$$I_3=\frac{1}{4} \left(-6 x^2 \text{Li}_2\left(-e^{-2 x}\right)-6 x
\text{Li}_3\left(-e^{-2 x}\right)-3 \text{Li}_4\left(-e^{-2 x}\right)+x^4+4 x^3 \log \left(e^{-2 x}+1\right)\right)$$ and
$$\int \tanh^2(x)\,dx=x-\tanh(x)$$ So, we have all the elements. Now solve
$$\frac{\partial \Phi}{\partial b_0}=\frac{\partial \Phi}{\partial b_1}=\frac{\partial \Phi}{\partial b_2}=\frac{\partial \Phi}{\partial b_3}=0$$ which makes a linear system of four equations for four unknowns.
So, we have the series expansion that we could transform as a $[2,1]$ Padé approximant around the midpoint $a=\frac {x_0+x_1}2$
$$\sum_{n=0}^3 b_n \,(x-a)^n\sim \frac{p_2 (x-a)^2 + p_1 (x-a) + p_0}{q_1 (x-a) + q_0}$$ with
$$p_2=b_2^2-b_1 b_3 \qquad p_1=b_1 b_2-b_0 b_3 \qquad p_0=b_0 b_2\qquad q_1=-b_3\qquad q_0=b_2$$
This procedure has been tried with $x_0=1$, $x_1=3$, $a=2$.
Using the original Padé approximant leads, for the $L^2$ norm, to a value equal to $1.67\times 10^{-4}$. The optimization of the series gives $3.57\times 10^{-6}$. The new Padé approximant leads to $3.33\times 10^{-4}$.
Much work for a more than bad result.