My book says that if we want to compute $\frac{e^x-1}{x}$ for $x$ near $0$ the following algorithm is a bad idea:
z1 = (exp(x) - 1)/x
while the following one is good:
z2 = (y - 1)/log(y)
I have proved that the first algorithm generates a large error (see below), I want to prove that the second algorithm generates a small error.
The following simulation (MATLAB) confirms that what I want to prove is true:
function myNotableLimit
ns = 10.^(-16:0.25:0);
err1 = zeros(length(ns),1);
err2 = zeros(length(ns),1);
value1 = zeros(length(ns),1);
value2 = zeros(length(ns),1);
for i=1:length(ns)
x = ns(i);
z1 = (exp(x) - 1)/x;
y = exp(x);
z2 = (y - 1)/log(y);
value1(i) = z1;
value2(i) = z2;
err1(i) = abs(1-z1);
err2(i) = abs(1-z2);
end
loglog(ns,err1,'r.-',ns,err2,'b.-',ns,value1,'r.-',ns,value2,'b.-');
h_legend=legend('error bad method','error good method','bad method','good method');
set(h_legend,'FontSize',20);
xlabel('x','FontSize',20);
ylabel('\epsilon','FontSize',20);
end
You can see that the second method is much better. Below there is my proof that the first method is bad (please tell me if you think that it is correct) and I show you even my attempt to prove that the second method is good. I haven't been able to prove it, nevertheless I will write down my reasonings.
Proof 1: "the first method is bad" \begin{equation*} z_1 = \frac{e^x - 1}{x} = \frac{fl\big(fl(e^x) - 1\big)}{x} = \frac{\big(e^x(1+\delta)-1\big)(1+\delta)}{x} = fl\left(\frac{\big(e^x(1+\delta) - 1\big)(1+\delta)}{x(1-\delta)}\right) = fl\left(\frac{\big(e^x(1+\delta) - 1\big)(1+\delta)^2}{x(1 - \delta^2)}\right) \approx \frac{\big(e^x(1+\delta) - 1\big)(1+\delta)^3}{x} = \frac{e^x-1}{x}(1+\delta)^2 + \frac{e^x\delta}{x}(1+\delta^3) \approx \frac{e^x-1}{x} + 3\delta\frac{e^x-1}{x} + \frac{e^x\delta}{x}. \end{equation*}
Since $3\delta\frac{e^x-1}{x}$ is small, the only term that can generate a big error is $\frac{e^x\delta}{x}$. This is the case, in fact we have \begin{equation*} \frac{e^x\delta}{x} = \frac{\delta + \delta x + o(\delta)}{x} = \frac{\delta}{x} + \delta + o(\delta). \end{equation*} The term that can rise an error is $\frac{\delta}{x}$. If the minimum number representable $x_{\text{min}} = 2^{e_{\text{min}}-1}$ is much smaller than the roundoff unit $\frac{1}{2}2^{1-t}$ ($t$ is the number of binary significant digits, $e_{\text{min}}$ is the smaller exponent available for machine representation, for instance in double precision we have $t = 53$, $e_{\text{min}} = -1021$) then we can have a big error. This happens when $e_{\text{min}} < 1 -t$, in fact \begin{equation*} x_{\text{min}} < \frac{1}{2}2^{1-t} = 2^{-t} \quad \Leftrightarrow \quad L < 1 - t. \end{equation*} We conclude that with the first method the error can be huge. If we also take account of the renormalised floating point representation (this is the case of modern architectures), i.e. we also deal with number smaller than $x_{\text{min}}$, which they don't have a normalised mantissa, then the error can become $2^{t-1}$ times larger.
Proof 2 (attempt): "the second method is good"
The calculations in the beginning are similar to the previous case, so I will be shorter. \begin{equation*} \frac{y-1}{\log y} = \frac{\big(y(1+\delta)-1\big)(1+\delta)^3}{\log y} = \frac{y-1}{\log y} + \frac{y-1}{\log y}3\delta + \frac{y\delta}{\log y} + o(\delta). \end{equation*} Troubles can arise only from the term $\frac{y\delta}{\log y}$, since the other are comparable to $\delta$, which by definition has the absolute value smaller than the roundoff unit, i.e. it is such that $\left\vert\delta\right\vert < \frac{1}{2}2^{1-t}$.
We can write \begin{equation*} \frac{y\delta}{\log y} \approx \frac{y\delta}{y-1}. \end{equation*} The minimum value that $y-1$ can assume is $1$ summed the machine epsilon, so $1 + 2^{1-t}$. We so get \begin{equation*} \max\left\vert\frac{y\delta}{y-1}\right\vert = \left\vert\frac{(1+2^{1-t})\delta}{2^{1-t}}\right\vert \approx \left\vert\frac{\delta}{2^{1-t}}\right\vert \leq \frac{\frac{1}{2}2^{1-t}}{2^{1-t}} = \frac{1}{2}. \end{equation*} So in this case the error is much smaller than before, but it can still be large: $\frac{1}{2}$ is a large error if the result we expect is $1$!
How would you prove that the error is small?
