14

What is the best/fastest way to load a 64-bit integer value in an xmm SSE2 register in 32-bit mode?

In 64-bit mode, cvtsi2sd can be used, but in 32-bit mode, it supports only 32-bit integers.

So far I haven't found much beyond:

  • use fild, fstp to stack then movsd to xmm register
  • load the high 32-bit portion, multiply by 2^32, add the low 32-bit

First solution is slow, second solution might introduce precision loss (edit: and it is slow anyway, since the low 32 bit have to be converted as unsigned...)

Any better approach?

Eric Grange
  • 5,931
  • 1
  • 39
  • 61
  • Multiplying the top 32 bits by 2**32 in floating-point isn't going to truncate/round them. It's only when you add the low 32 bits to them the sum gets rounded/truncated and that's what you'll get with the first method anyway. Unless I'm missing something, these two methods are equivalent (except for performance). – Alexey Frunze Mar 22 '13 at 12:16
  • 3
    FWIW gcc seems to use the first approach (fild, fst, movsd). – Paul R Mar 22 '13 at 12:31
  • 2nd option is slow actually, I mistakenly used cvtsi2sd for the low 32-bit, but that was incorrect, it needs to be converted as unsigned, for which no CPU instruction exists, so it is slow... – Eric Grange Mar 22 '13 at 12:59
  • 3
    There is a trick with internal representation of IEEE doubles and magic constants, for example: http://software.intel.com/en-us/forums/topic/301988, but don't know about speed – MBo Mar 22 '13 at 13:36
  • 1
    And better explanation (for unsigned) here: http://stackoverflow.com/questions/13734191/are-there-unsigned-equivalents-of-the-x87-fild-and-sse-cvtsi2sd-instructions – MBo Mar 22 '13 at 13:54
  • There is indeed a very fast method that I use. It's related to MBo's suggestion. But it's very hacky and only works for a range of numbers. – Mysticial Mar 22 '13 at 17:19
  • @Mysticial Yes, I know them, but in case of overflow, I want to preserve the high order bits (as an fild does), and not the first 52 low order bits. – Eric Grange Mar 22 '13 at 20:24
  • Ah ic. Then you're probably out of luck. At least I'm not aware of anything else. – Mysticial Mar 22 '13 at 20:37
  • Yet another reason why 32-bit is obsolete. BTW, for vector integer<->double, AVX512 will finally introduce packed 64-bit integer <-> double conversions. Until then, even in 64-bit mode, there's just been [CVTDQ2PD xmm1, xmm2/m64](http://www.felixcloutier.com/x86/CVTDQ2PD.html) which converts a pair of 32-bit integers. – Peter Cordes Sep 18 '16 at 06:37

1 Answers1

9

Your second option can be made to work, though it's a little unwieldy. I'll assume that your 64-bit number is initially in edx:eax.

cvtsi2sd xmm0, edx              // high part * 2**-32
mulsd    xmm0, [2**32 from mem] // high part
movsd    xmm2, [2**52 from mem]
movd     xmm1, eax
orpd     xmm1, xmm2             // (double)(2*52 + low part as unsigned)
subsd    xmm1, xmm2             // (double)(low part as unsigned)
addsd    xmm0, xmm1             // (double)(high part + low part as unsigned)

All of the operations except for possibly the final one are exact, so this is correctly rounded. It should be noted that this conversion produces -0.0 when the input is 0 and the mxcsr is set to round-to-minus-infinity. This would need to be addressed if it were being used in a runtime library for a compiler aiming to provide IEEE-754 conformance, but is not an issue for most usage.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269