How does conversion from fixed-point to floating-point happen?

Question

I came across to the code that convert 32-bit signed fixed-point number (16.16) to a float and it looks like (pseudocode)

floating = fixed / 65536.0

Could you please explain me what's the essence of dividing by this? Why does this dividing works when fixed-point and floating-point numbers have different internal structures?

Yuval Filmus · Accepted Answer · 2017-10-15T13:47:36.867

Your code converts a fixed-point number into its value. It also works for converting a fixed-point number to a rational number, for example.

A fixed-point number of the form $16.16$ consists of 32 binary digits, the first 16 to the left of the decimal dot, the second 16 to its right. When you insert the decimal dot, you are dividing by $2^{16} = 65536$.

Here is a decimal example. Consider a number stored in decimal fixed-point $2.2$. What is the value of the number stored as $1234$? It is $12.34 = 1234/100$.

zoran404 · Answer 2 · 2023-03-01T02:11:11.977

Your code snippet is a bit misleading.

floating = fixed / 65536.0

It would be more correct to write it as:

floating = float(fixed) / float(65536)

With the float() being the CPU's built-in operation for converting int to float.

If you're working with C/C++ the conversion to float (or double-float) is implicit, so it's easily overlooked.

But how does all this actually work?

When converting a fixed-point number to a floating-point number you have to copy the most significant bits of the fixed-point number into the mantissa and to calculate the exponent based on which bits you copied.

In your case you are storing the fixed-point number as a int32 variable, which allows you to exploit the CPU's built-in operation for converting int to float.

But the CPU thinks it's working with an int and it does not know that some of the bits are used to store the fraction, so it returns a floating-point number with the wrong exponent.

Since our fixed-point number uses 16 bits to store the fraction we can fix our float by decrementing its exponent by 16.

With floats the division operator is actually implemented by subtracting the exponents and dividing the mantissas.

The floating-point number 65536 i.e. 2^16 gives us a number with the exponent of 16 and mantissa of 1.

By dividing by 65536 we are decrementing our exponent by 16 and dividing our mantissa by 1 (dividing by 1 does nothing).

It's a pretty smart peace of code for what's it worth.

This trick can also be used for converting very large fixed-point numbers by first bit-shifting the fixed-point number to the right until it can fit into int32 (or int64 if possible), then using your code snippet and finally incrementing the exponent by the number of times you had to bit-shift.
Of course you will lose bits by bit-shifting, but those wouldn't fit into the float anyway.

How does conversion from fixed-point to floating-point happen?

2 Answers2