Best way to load a 64-bit integer to a double precision SSE2 register?

Question

What is the best/fastest way to load a 64-bit integer value in an xmm SSE2 register in 32-bit mode?

In 64-bit mode, cvtsi2sd can be used, but in 32-bit mode, it supports only 32-bit integers.

So far I haven't found much beyond:

use fild, fstp to stack then movsd to xmm register
load the high 32-bit portion, multiply by 2^32, add the low 32-bit

First solution is slow, second solution might introduce precision loss (edit: and it is slow anyway, since the low 32 bit have to be converted as unsigned...)

Any better approach?

Multiplying the top 32 bits by 2**32 in floating-point isn't going to truncate/round them. It's only when you add the low 32 bits to them the sum gets rounded/truncated and that's what you'll get with the first method anyway. Unless I'm missing something, these two methods are equivalent (except for performance). — Alexey Frunze, Mar 22 '13 at 12:16
FWIW gcc seems to use the first approach (fild, fst, movsd). — Paul R, Mar 22 '13 at 12:31
2nd option is slow actually, I mistakenly used cvtsi2sd for the low 32-bit, but that was incorrect, it needs to be converted as unsigned, for which no CPU instruction exists, so it is slow... — Eric Grange, Mar 22 '13 at 12:59
There is a trick with internal representation of IEEE doubles and magic constants, for example: http://software.intel.com/en-us/forums/topic/301988, but don't know about speed — MBo, Mar 22 '13 at 13:36
And better explanation (for unsigned) here: http://stackoverflow.com/questions/13734191/are-there-unsigned-equivalents-of-the-x87-fild-and-sse-cvtsi2sd-instructions — MBo, Mar 22 '13 at 13:54
There is indeed a very fast method that I use. It's related to MBo's suggestion. But it's very hacky and only works for a range of numbers. — Mysticial, Mar 22 '13 at 17:19
@Mysticial Yes, I know them, but in case of overflow, I want to preserve the high order bits (as an fild does), and not the first 52 low order bits. — Eric Grange, Mar 22 '13 at 20:24
Ah ic. Then you're probably out of luck. At least I'm not aware of anything else. — Mysticial, Mar 22 '13 at 20:37
Yet another reason why 32-bit is obsolete. BTW, for vector integer<->double, AVX512 will finally introduce packed 64-bit integer <-> double conversions. Until then, even in 64-bit mode, there's just been [CVTDQ2PD xmm1, xmm2/m64](http://www.felixcloutier.com/x86/CVTDQ2PD.html) which converts a pair of 32-bit integers. — Peter Cordes, Sep 18 '16 at 06:37

score 9 · Accepted Answer · answered Mar 24 '13 at 13:08

Your second option can be made to work, though it's a little unwieldy. I'll assume that your 64-bit number is initially in edx:eax.

cvtsi2sd xmm0, edx              // high part * 2**-32
mulsd    xmm0, [2**32 from mem] // high part
movsd    xmm2, [2**52 from mem]
movd     xmm1, eax
orpd     xmm1, xmm2             // (double)(2*52 + low part as unsigned)
subsd    xmm1, xmm2             // (double)(low part as unsigned)
addsd    xmm0, xmm1             // (double)(high part + low part as unsigned)

All of the operations except for possibly the final one are exact, so this is correctly rounded. It should be noted that this conversion produces -0.0 when the input is 0 and the mxcsr is set to round-to-minus-infinity. This would need to be addressed if it were being used in a runtime library for a compiler aiming to provide IEEE-754 conformance, but is not an issue for most usage.

Best way to load a 64-bit integer to a double precision SSE2 register?

1 Answers1

Linked