Accurate floating-point linear interpolation

Question

I want to perform a simple linear interpolation between $A$ and $B$ (which are binary floating-point values) using floating-point math with IEEE-754 round-to-nearest-or-even rounding rules, as accurately as possible. Please note that speed is not a big concern here.

I know of two basic approaches. I'll use the symbols $\oplus, \ominus, \otimes, \oslash$ following Knuth [1], to mean floating-point addition, subtraction, product and division, respectively (actually I don't use division, but I've listed it for completeness).

(1) $\quad f(t) = A\,\oplus\,(B\ominus A)\otimes t$

(2) $\quad f(t) = A\otimes(1\ominus t)\,\oplus \,B\otimes t$

Each method has its pros and cons. Method (1) is clearly monotonic, which is a very interesting property, while it is not obvious at all to me that that holds for method (2), and I suspect it may not be the case. On the other hand, method (2) has the advantage that when $t = 1$ the result is exactly $B$, not an approximation, and that is also a desirable property (and exactly $A$ when $t = 0$, but method (1) does that too). That follows from the properties listed in [2], in particular:

$u\oplus v = v\oplus u$

$u\ominus v = u\oplus -v$

$u\oplus v = 0$ if and only if $v = -u$

$u\oplus 0 = u$

$u\otimes 1 = u$

$u\otimes v = 0$ if and only if $u = 0$ or $v = 0$

In [3] Knuth also discusses this case:

$u' = (u\oplus v)\ominus v$

which implicitly means that $u'$ may or may not be equal to $u$. Replacing $u$ with $B$ and $v$ with $-A$ and using the above rules, it follows that it's not granted that $A\oplus(B\ominus A) = B$, meaning that method (1) does not always produce $B$ when $t = 1$.

So, here come my questions:

Is method (2) guaranteed to be monotonic?
If not, is there a better method that is accurate, monotonic and yields $A$ when $t = 0$ and $B$ when $t = 1$?
If not (or you don't know), does method (1) when $t = 1$ always overshoot (that is, $A\oplus(B\ominus A)=A+(B-A)\cdot t$ for some $t \geq 1$)? Always undershoot (ditto for some $t \leq 1$)? Or sometimes overshoot and sometimes undershoot?

I assume that if method (1) always undershoots, I can make a special case when $t = 1$ to obtain the desired property of being exactly equal to $B$ when $t = 1$, but if it always overshoots, then I can't. That's the reason for question 3.

EDIT: I've found that the answer to question 3 is that it sometimes overshoots and sometimes undershoots. For example, in double precision:

-0x1.cae164da859c9p-1 + (0x1.eb4bf7b6b2d6ep-1 - (-0x1.cae164da859c9p-1))

results in 0x1.eb4bf7b6b2d6fp-1, which is 1 ulp greater than the original, while

-0x1.be03888ad585cp-1 + (0x1.0d9940702d541p-1 - (-0x1.be03888ad585cp-1))

results in 0x1.0d9940702d540p-1, which is 1 ulp less than the original. ~~On the other hand, the method that I planned (special casing $t=1$) won't fly, because I've found it can be the case where $t < 1$ and $A\oplus(B\ominus A)\otimes t > B$, for example:~~

t = 0x1.fffffffffffffp-1
A = 0x1.afb669777cbfdp+2
B = 0x1.bd7b786d2fd28p+1

$A \oplus (B \ominus A)\otimes t =\,$ 0x1.bd7b786d2fd29p+1

which means that if method (1) is to be used, the only strategy that may work is clamping.

Update: As noted by Davis Herring in a comment and later checked by me, special casing t=1 actually works.

References

[1] D.E.Knuth, The Art of Computer Programming, vol. 2: Seminumerical algorithms, third edition, p. 215

[2] Op. cit. pp. 230-231

[3] Op. cit. p.235 eq.(41)

The greatest pity is I have but one upvote to give. anyway: $1-t$ is monotonic for $t$ from $1/2$ to $1$, guaranteed; it's the other half that might be a problem. I know that most sites declare interpolation in this way "stable" but perfection might still be too much to ask. — Dan Uznanski, Aug 26 '14 at 12:32
Hmmm. $(1-t)$ can jump very suddenly for small $t$, with much larger variation than $t$ itself's epsilon. But, since in this case $(1-t)$ is quite large, the lowest precision bits of $t$ might not affect much... unless of course $A$ is really big comparatively... — Dan Uznanski, Aug 26 '14 at 13:59
That was a dead end. What if A and B are equal -- what if they're both 1? Is there a value of $t$ that $t \oplus (1 \ominus t) \neq 1$? — Dan Uznanski, Aug 26 '14 at 17:40
I made an experiment with single precision. Method (2) was monotonic for all $t\in[0,1]$ (it took about 30s here for my C program to go over all $t$ values). $A, B$ were random numbers in $[-3,3)$. I left it running for more than 50 tries and it was monotonic in each. Forcing $A=1, B=1$ in the same program did not break monotonicity either. The program is at http://www.formauri.es/personal/pgimeno/pastes/monotonictest.c but note that it seems to get into an infinite loop when compiled with optimizations on. That doesn't mean it's always monotonic, but it certainly hints towards it. — Pedro Gimeno, Aug 27 '14 at 14:17
By the way, $1-t$ is guaranteed to be monotonic for all $t$. It's the addition of the products that worries me. One of the products is monotonically descending and the other is monotonically ascending (assuming both are positive) but do we have a guarantee that their sum is monotonic? For example, the sequence $3, 1, 0$ and the sequence $0, 1, 5$ are monotonically descending and ascending respectively, but their sum $(3, 2, 5)$ is not monotonic. — Pedro Gimeno, Aug 27 '14 at 14:53
THis was my worry as well. If you're using a rounding mode other than round-to-even I'm pretty sure it blows up in unfun ways. ... this may in fact be why we use round-to-even... — Dan Uznanski, Aug 27 '14 at 15:04
A "hybrid" method would be: $A\ominus A\otimes t\oplus B\otimes t$. That also matches the endpoints exactly. Not sure if it would yield any gain or loss, but I figured I'd toss it for consideration. — Pedro Gimeno, Aug 28 '14 at 01:55
As a side note, the "infinite loop" I mentioned in a comment above wasn't infinite. It turns out gcc optimized it so all operations were done inside the coprocessor, meaning extended precision, making the loop many orders of magnitude slower. Marking t as volatile fixed it. I've updated the program. — Pedro Gimeno, Aug 30 '14 at 20:51
@PedroGimeno In practical terms, I think you would also want to look at approaches based on FMA (fused multiply-add), as this operation is also governed by IEEE 754-2008 and available on virtually all modern processors. Specifically, I would suggest looking at interpolating via fma(t, b, fma(-t, a, a)) — njuffa, May 21 '16 at 21:45
@njuffa: Yes, I've been considering hardware multiply-add, but that would also need guarantees in order to be monotonic and match the endpoints. Your suggested formula breaks monotonicity. Example in simple precision: A=0x1.FC5A90p+13, B=0x1.4BB814p+5, t=0x1.8B212Ap-17 results in 0x1.FC5908p+13, and same A&B with the next t which is 0x1.8B212Cp-17 gives 0x1.FC590Ap+13 which is greater than the previous value, but it should be descending because A > B. Perhaps an alternative is to calculate the error in B-A and add that later, something like: fma(t, err(B-A), fma(t, B-A, A)). I'll test. — Pedro Gimeno, May 22 '16 at 21:29
Nope, that last method, albeit it doesn't seem to break monotonicity, misses the endpoint on occasion. For example A=0x1.B1B374p+41, B=-0x1.A6404Ep+43, t=1.0 yields -0x1.A64050p+43. Turning around err and B-A doesn't help, more like the contrary. For reference, err(u+v) = fabs(u)>=fabs(v) ? u-(u+v)+v : v-(u+v)+u (Knuth, theorem 4.2.2-C). Here, u=B, v=−A. — Pedro Gimeno, May 23 '16 at 22:41
@PedroGimeno The two-FMA formula I suggested has error <= 1 ulp based on my testing. Do you have insights into how small the ulp error has to be so as to guarantee monotonicity? BTW, once you find a good solution to your question I would suggest formally publishing it. Problems of similar size and nature (common operations that are seemingly trivial but difficult to characterize exactly numerically) have been published on in recent years; in general the benefits of FMA are underexplored. — njuffa, May 25 '16 at 15:29
@PedroGimeno Strictly using single-precision operations in round-to-nearest-or-even mode, I am unable to reproduce the monotonicity failure reported above for the two-FMA formula: I get a=0x1.fc5a90p+13 b=0x1.4bb814p+5 t=0x1.8b212ap-17 t2=0x1.8b212cp-17 res=0x1.fc590ap+13 res2=0x1.fc590ap+13 — njuffa, May 25 '16 at 16:08
@njuffa You're right, they're equal, sorry. I had a bug in my previous implementation of FMA for testing. These values do produce a monotonicity failure (double checked by hand with an infinite precision program): A=0x1.24CBDAp37, B=0x1.F50378p22, t1=0x1.059D1Ap-10, t2=0x1.059D1Cp-10; FMA(-t1, A, A) = round(0x1.24810C1EFF2677p37) = 0x1.24810Cp37; FMA(t, B, 0x1.24810Cp37) = round(0x1.24810CFFFFFFAD7918p37) = 0x1.24810Cp37; FMA(-t2, A, A) = round(0x1.24810C1E6CC08Ap37) = 0x1.24810Cp37; FMA(t2, B, 0x1.24810Cp37) = round(0x1.24810D000001A27C9p37) = 0x1.24810Ep37; note A>B yet r1<r2. — Pedro Gimeno, May 26 '16 at 19:53
@DavisHerring I agree that my example is wrong, and in a quick test, I can't find any examples where t < 1 gives a value out of the interval, but it's not obvious to me why. I think we're in the same case as the no monotonicity violation in the change of intervals used in my answer below. — Pedro Gimeno, Dec 18 '19 at 14:41
@PedroGimeno: With round-to-nearest, it’s impossible to get a value outside the interval from that form except from t=1 itself, because either b-a is exact or t*(b-a) is rounded in the correct direction from (the exact) b-a. In particular, note that for any 0<f<1 and (w.l.o.g.) x>0 0<x*f<x unless x is denormal. — Davis Herring, Dec 18 '19 at 15:08
@DavisHerring I've opened a chat room to discuss the issue: https://chat.stackexchange.com/rooms/102344/discussion-of-https-math-stackexchange-com-questions-907327-accurate-floating-p - would you please explain there? — Pedro Gimeno, Dec 18 '19 at 20:07
Backlink to https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0811r3.html , a proposal, since accepted, to add into the C++ standard a library function lerp that computes this linear interpolation operation. — Zsbán Ambrus, Mar 27 '24 at 13:21

Pedro Gimeno · Answer 1 · 2016-06-12T03:11:09.163

1) Method 2 is not always monotonic. Counterexample in double precision:

$$A = B = 4000\quad t=\tt{0x1.b2f3db7800a39p-2}$$

From there, it results that:

$$1\ominus t=\tt{0x1.26861243ffae4p-1}$$

$$4000\otimes \tt{0x1.26861243ffae4p-1}\oplus 4000\otimes \tt{0x1.b2f3db7800a39p-2}=\tt{0x1.f400000000001p+11}\approx 4000.0000000000005$$

(obviously, when $t = 0$ and $t = 1$ the result equals 4000, so it ascends then descends, therefore it's not monotonic).

2) The second question asks for a better method. Here is a candidate:

$$lerp_{A,B}(t)=\begin{cases}A\oplus(B\ominus A)\otimes t&& \text{if}\,\ t<0.5 \\ B\ominus(B\ominus A)\otimes(1\ominus t)&&\text{otherwise}\end{cases}$$

This method has the following properties:

It matches the endpoints (obvious).
Both halves are monotonic (obvious).
Exact when $A=B$ (obvious)
Seems to be monotonic in the change of intervals.

The latter is the only part that is not proven. It turns out that there are many values of $A$ and $B$ with $A<B$ for which $A\oplus(B\ominus A)\otimes 0.5>B\ominus(B\ominus A)\otimes 0.5$; however, after testing many billions of single- and double-precision numbers, I could not find a single case violating either of these implications:

Let $s=0.5-\mathrm{ulp}(0.5)/2\quad$(the number immediately preceding 0.5), then

$A<B\implies A\oplus(B\ominus A)\otimes s\le B\ominus(B\ominus A)\otimes 0.5$

$A>B\implies A\oplus(B\ominus A)\otimes s\ge B\ominus(B\ominus A)\otimes 0.5$

Update: The reason it works seems to have to do with this fact: barring underflow, for any positive binary floating point number $u, u\otimes s < u/2$, and similarly for negative $u$ changing the inequality direction.

Or put another way, under the same assumption, for base 2 and precision $p, u\otimes\frac{(2^p-1)}{2^p} <u$. It can be shown that the bit that determines rounding (the bit next to the last bit of the mantissa) is always zero when performing that product, therefore the product is always rounded towards zero. That multiplication has the effect of subtracting 1 ulp from the number (or 1/2 ulp if the input number has a mantissa of 1.0).

I've deleted my two other proposals after discovering that both violated monotonicity.

Your proposal for an improved method looks very promising. I, too, am unable to find monotonicity failures. The maximum error when using this method appears to be < 2 ulp, which can be reduced to < 1.5 ulp by the straightforward application of FMA: (t < 0.5f) ? fma (b - a, t, a) : fma (b - a, t - 1.0f, b). A faithfully-rounded variant (i.e. max error < 1 ulp) would definitely be "nice to have". — njuffa, May 25 '16 at 17:41
I'm afraid we'd need a "FAMA" operation (fused add-multiply-add). — Pedro Gimeno, May 26 '16 at 19:49
It's worth noting that when $t$ is in the interval $[0.5, 1], 1\ominus t=1-t$ (there's no loss of precision in that range). — Pedro Gimeno, May 27 '16 at 14:28
Neater counterexample to monotonicity of method 2, which works with any floating point type: A=B=(the number immediately preceding 1), t=3/8. (Rounding mode must be something reasonable such as round-ties-to-even, round-ties-towards-0, or round-ties-towards-minus-infinity.) — Don Hatch, Nov 07 '20 at 03:37

Pedro Gimeno · Answer 2 · 2018-10-30T00:26:17.577

The exact interpolated value can be represented as the arithmetic sum of 5 floating-point values: one for the approximated result, one for the error in each operation [1] and one for the product error when multiplying the subtraction error by t. Here's a C version of the algorithm in single precision; for simplicity, it uses fused multiply-add to calculate the product error:

#include <math.h>

float exact_sum(float a, float b, float *err)
{
    float sum = a + b;
    float z = sum - a;
    *err = a - (sum - z) + (b - z);
    return sum;
}

float exact_mul(float a, float b, float *err)
{
    float prod = a * b;
    *err = fmaf(a, b, -prod);
    return prod;
}

float exact_lerp(float A, float B, float t,
                 float *err1, float *err2, float *err3, float *err4)
{
    float diff = exact_sum(B, -A, err1);
    float prod = exact_mul(diff, t, err2);
    *err1 = exact_mul(*err1, t, err4);
    return exact_sum(A, prod, err3);
}

In order for this algorithm to work, operations need to conform to IEEE-754 semantics in round-to-nearest-or-even mode. That's not guaranteed by the C standard, but the GNU gcc compiler can be instructed to do so, at least in processors supporting SSE2 [2][3].

It is guaranteed that the arithmetic addition (as opposed to floating-point addition) of $result + err_1 + err_2 + err_3 + err_4$ will be equal to the desired result; however, there is no guarantee that it will even be a floating-point number. I'm not aware of any guarantees for partial sums of these. $err_4$ is expected to be generally much smaller than the others when they are nonzero.

For example: exact_lerp(0.23456789553165435791015625f, 7.345678806304931640625f, 0.300000011920928955078125f, &err1, &err2, &err3, &err4) returns $2.3679010868072509765625$ and the errors are $6.7055225372314453125\cdot 10^{-8}$, $8.4771045294473879039287567138671875\cdot 10^{-8}$, $1.490116119384765625\cdot 10^{-8}$ and $2.66453525910037569701671600341796875\cdot 10^{-15}$ respectively. These numbers add up to the exact result, which is $2.36790125353468550173374751466326415538787841796875$ (not a single-precision float, though in this case it happens to fit in a double-precision one).

All numbers in the example above are written using their exact values, rather than a number that approximates to them. All but the last are single-precision floats.

If the goal is precision, I would expect that calculating $err_1 \oplus err_2 \oplus err_3 \oplus err_4 \oplus result$ (in left-to-right order) gives a better approximation to the actual result than just using $result$ alone. In the example above, it gives $2.367901325225830078125$ which matches the rounded value of the exact result. I haven't made any testing as for monotonicity. $t=0$ matches the endpoint, and probably $t=1$ does too.

Not sure if it's worth adding $err_4$ at all, given how small its contribution is.

References

[1] Graillat, Stef (2007). Accurate Floating Point Product and Exponentiation.
[2] https://stackoverflow.com/questions/7295861/enabling-strict-floating-point-mode-in-gcc
[3] Semantics of Floating Point Math in GCC

A perhaps-easier way to express it exactly as a sum of 5 floating-point numbers: the value we're computing is exactly $a - ta + tb$. Your exact_mul can be used to exactly convert each of the two products in that expression into sums, so the expression turns into the exact sum of 5 floating-point numbers. — Don Hatch, Nov 19 '20 at 15:02

kaba · Answer 3 · 2022-04-27T00:14:57.123

I'll show how to compute the value of the linear interpolation with correct rounding; i.e. as if the linear interpolation were computed with exact arithmetic and then rounded.

First represent the linear interpolation as $a + tb - ta$. Using the exact product from Pedro Gimenos answer, decompose $tb = c_0 + c_1$. Similarly for $ta = d_0 + d_1$. These transformations are error-free.

The problem is then reduced to computing the sum $a + c_0 + c_1 + d_0 + d_1$ with correct rounding. This can be done by applying the iFastSum algorithm from the paper Correct Rounding and a Hybrid Approach to Exact Floating-Point Summation.

As a bonus, correct rounding implies that the sign of the result is exact.

Clearly the same procedure can be adapted to compute dot products with correct rounding. This implies that there is an efficient exact test for the sign of a dot product, which is quite interesting for computational geometry.

I did a quick timing. The algorithm described here, implemented in C#, is 14 times slower than the naive floating point $(1 - t)a + tb$, averaged over 10 million computations where $a$, $b$ and $t$ are each uniformly random on $[0, 1]$.

Aside: for a correctly rounded sum algorithm which is asymptotically fast and uses iFastSum as a sub-algorithm: "Algorithm 908: Online Exact Summation of Floating-Point Streams", ACM Transactions on Mathematical Software,Volume 37, Issue 3, September 2010.

I was not aware of OnlineExactSum by Zhu et al., so +1. From the paper, the algorithm looks somewhat complex, although it is asymptotically fast for a large number of summands. Is an efficient specialization for five operands possible or known? I am aware of papers showing how to sum three floating-point operands with correctly rounded result in straightforward fashion. — njuffa, Apr 26 '22 at 23:27
I've edited the post. For small sums one should just use iFastSum, which OnlineExactSum uses as a sub-algorithm. This simplifies the implementation. I am not aware of specializations for more than 3 operands. However, the iFastSum is actually quite fast. It takes in an array and uses it as an intermediate space for computations. I can post some rough timings comparing the described algorithm with naive floating point. — kaba, Apr 26 '22 at 23:59

Dan Uznanski · Answer 4 · 2014-08-27T23:05:23.923

1

Method 2 is monotonic, if you are using round-to-even (which is fortunately the default).

Let's consider $A=B=1$ and half-precision numbers ('cause they're short), and $t=5/2^{12}$: t = 0.000000000101 1-t = 0.111111111011 But that's too long - we only get 11 bits. What value we actually get depends on what rounding mode we're in.

in "towards 0" (truncate) and "towards $-\infty$" (floor) modes:

1-t = 0.11111111101

in the other modes, "towards $\infty$" (ceiling), "ties away from 0", and "ties to even":

1-t = 0.11111111110

Now let's add them back together.

truncate and floor:$t+(1-t)=0.11111111111(1)=0.11111111111<1$

ceiling and ties away from 0: $t+(1-t)=1.0000000000(1) = 1.0000000001>1$

ties to even: $t+(1-t)=1.0000000000(1)=1.0000000000=1$

Some more analysis tells us what's going on: the goal is to have the two rounding steps counteract each other. This never happens with floor/truncate/ceiling. Most of the time it happens with ties away from zero, but in the situation where there is a tie, both rounding steps bias the result upward. With rounds-to-even, the rounding steps are always opposite each other: for ones that round down during the $1-t$ step ($3/2^{12}$ for instance), they'll round up during the addition step, and vice versa.

edited Aug 27 '14 at 23:05

answered Aug 27 '14 at 17:41

Dan Uznanski

11,488

1

That's a good analysis of why $t\oplus(1\ominus t)=1$, thank you. I don't immediately see how it can be extended to $A\neq B$, though, or even to $A=B\neq 1$. – Pedro Gimeno Aug 28 '14 at 01:26
I am still working on getting this part of my brain going again. It's been a long time since I did anything with this stuff. – Dan Uznanski Aug 28 '14 at 01:29
2

This isn't right; method 2 is not monotonic when using round-to-even. Counterexample: A=B=pred(1) (i.e. the greatest representable number less than 1); t=3/8. Three roundings happen, all in the same direction, and the result of the calculation A⊗(1⊖t)⊕B⊗t is pred(A) = pred(pred(1)), not A. – Don Hatch Nov 07 '20 at 02:17
argh, you got me there. Look at the other answers, I really didn't go anywhere near far enough. – Dan Uznanski Nov 09 '20 at 21:16

score 0 · Answer 5 · answered Apr 27 '22 at 00:29

I provided a correctly rounded approach in my other answer. I'll provide here another approach for improved accuracy which is within 2x performance of naive interpolation, and probably gets you a faithfully rounded result (i.e error < 1 ulp).

Simply implement the compensated pair double arithmetic from the paper "Faithfully Rounded Floating-point Computations". Then compute as you would normally. The implementation effort is low.

Accurate floating-point linear interpolation

5 Answers5

References

Linked