mean of two floating point numbers: why $a+\frac{b-a}{2}$ is better than $\frac{a+b}{2}$?

Question

$a$ and $b$ are the floating point representation of two real numbers with no constraints (they can be both negative or both positive or one positive and the other negative and so on).

I read in the Armadillo codebase, a linear algebra library, that the "robust" mean of the two $a$ and $b$ is computed as $a+\frac{b-a}{2}$ and not as the (naïve) $\frac{a+b}{2}$.

I imagine that it is more robust for what regard the overflow that it can happen when $a$ and $b$ are similar to half the largest representable value; moreover I feel that if $a\approx b$ then the cancellation error incurring in their subtraction will not be a problem.

But, how one can rigorously prove these intuitions?

@Arthur the linked code makes no assumptions about the sign of $a$ and $b$ and it does not compute the largest of the two. — Alessandro Jacopson, Aug 05 '20 at 12:31
It should be just to avoid exceeding INT_MAX = 2147483647 (or any maximum integer allowed) when calculating a+b. — VIVID, Aug 05 '20 at 12:35
@VIVID: But it is counter-productive in that regard if $a$ and $b$ have opposite signs. — TonyK, Aug 05 '20 at 12:36
If $a$, $b$ are of opposite signs, the robust mean makes the computation less robust ! (crossed with TonyK). — , Aug 05 '20 at 12:38
Then, to make it really robust if-else condition is needed to check if $ab>0$ or vice versa to choose the option — VIVID, Aug 05 '20 at 12:42
@VIVID: the post concerns floating-point numbers. The chances of avoiding overflow this way are much lower than for integer numbers. This even looks paranoid, unless there is a good reason. — , Aug 05 '20 at 12:44
@YvesDaoust https://gitlab.com/conradsnicta/armadillo-code/-/blob/9.900.x/include/armadillo_bits/op_median_meat.hpp#L380 — Alessandro Jacopson, Aug 05 '20 at 12:51
This use case is for computation of the median, in the case of an even number of elements. It indeed keeps the computation overflow-free for any representable pairs of values (of the same sign) in the given type. For floating-point arguments, just paranoid. For bytes, quite useful :-) — , Aug 05 '20 at 13:10
A similar case can be found in the secant root formula central to many root finding methods. There it is also suggested to replace $\frac{af_b-bf_a}{f_b-f_a}$ by $a-f_a\frac{b-a}{f_b-f_a}$ to avoid difficulties in the limit where $a\approx b$, $f_a=f(a)\approx 0$. It seems similarly difficult to construct cases where this becomes visibly relevant. — Lutz Lehmann, Aug 05 '20 at 13:28
@LutzLehmann do you have any reference to the suggestion you wrote about? — Alessandro Jacopson, Aug 05 '20 at 13:31
On second thought, in the first form one gets catastrophic cancellation in both numerator and denominator, the resulting quotient becomes inaccurate. While in the second form the catastrophic cancellation still may occur, the factor $f_a\approx 0$ makes the update small close to a root. — Lutz Lehmann, Aug 05 '20 at 16:59
Yes, it was relative to my first comment. For the original question, a more general case was extensively discussed in https://math.stackexchange.com/questions/907327/accurate-floating-point-linear-interpolation — Lutz Lehmann, Aug 05 '20 at 18:32
C++ P0811r3 states that $\frac{(a+b)}{2}$ can cause overflow, and that $a+\frac{(b-a)}{2}$ can cause overflow and produce incorrectly rounded results. $\frac{a}{2}+\frac{b}{2}$ prevents overflow but is not correctly rounded for subnormal inputs. It thus suggests switching between strategies: $\begin{cases}\frac{a}{2}+\frac{b}{2}&&\texttt{isnormal}(a),\texttt{&&},\texttt{isnormal}(b)\\frac{a+b}{2}&&\text{otherwise}\end{cases}$. — CAD97, Feb 07 '21 at 20:45

mean of two floating point numbers: why $a+\frac{b-a}{2}$ is better than $\frac{a+b}{2}$?

0 Answers0