In assembly, how to handle Windows API with a return value that's documented as less than native register width?

Question

For many years, ntohs (32-bit version) had a return value that was zero extended into the high 16 bits (word). However, after a recent update of Windows 10, it sometimes returns garbage in the upper word.

For example, passing argument 0xF00D to

call        ntohs

the return values are

EAX = 00F00DF0 (32-bit code)
RAX = 0000000000000DF0 (64-bit code)

The fix in MASM code is to zero extend the return register using movzx (identical to MSVC)

call        ntohs
movzx       eax, ax

Question

Is it correct that any Windows API with a return value that's documented as less than native register width, there's no guarantee of the upper bits being zeroed out i.e., the return value in RAX = 0000000000000DF0 (64-bit) was most likely coincidental?

You're talking about C-compatible Windows DLLs, not "all APIs" correct? C-compatible functions follow the per-platform C ABI. COM objects have a more complicated binary interface which is also well-defined (and references the C ABI). WinRT has yet a different binary interface, abstracted through platform-agnostic metadata. And that doesn't even get into things like .NET which ship with the OS but aren't really OS APIs. — Ben Voigt, Oct 07 '21 at 17:06
Who is looking at those upper bits of the return value? C code appears not to: https://godbolt.org/z/37x96qz37 (it zero extends ax, though why it doesn't just `test ax,ax` I don't know..), so as far as I can tell, it isn't looking at those upper bits. — Erik Eidt, Oct 07 '21 at 18:44
MSVC does a [movzx ecx, ax] so ecx contains the 2 reversed bytes and the upper 16 bits cleared. But, in MASM, I've seen code incorrectly taking eax (instead of ax) by assuming the upper bits were zeroed out. — vengy, Oct 07 '21 at 19:39
For 32 bits, https://learn.microsoft.com/en-us/cpp/cpp/argument-passing-and-naming-conventions?view=msvc-160 *seems* to say that "Return values are also widened to 32 bits and returned in the EAX register", which I think would contradict what you observed. But I'm not so familiar with the many different x86 calling conventions on Windows, so I'm not ready to call this a Windows library bug. — Nate Eldredge, Oct 07 '21 at 19:52
Note that its quite inefficient to actually set up args and `call ntohs` (and treat the call-clobbered regs as clobbered) when you could just `rol ax, 8` (or `ntohl` -> `bswap eax` or whatever register is convenient). You're programming in x86 asm and it's a known fact that x86 is little-endian. That would also solve the zero-extension problem because you can use `movzx` yourself, or for 32-bit, writing a 32-bit register guarantees zero-extension. — Peter Cordes, Oct 07 '21 at 20:50

score 4 · Accepted Answer · answered Oct 07 '21 at 19:38

4

At least for x64, the upper bits are undefined and the caller is responsible for doing zero- or sign-extension if needed. If your code relied on upper bits being zero, I'm afraid it was buggy all along, and you have just been lucky until now.

https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160

The state of unused bits in the value returned in RAX or XMM0 is undefined.

answered Oct 07 '21 at 19:38

Nate Eldredge

48,811
6
54
82

Perfect! "The state of unused bits in the value returned in RAX or XMM0 is undefined." Thanks! – vengy Oct 07 '21 at 20:15
@vengy: Note that other x86 calling conventions work the same way, such as x86-64 System V, and i386 System V (used non-Windows OSes). On some other ISAs, like MIPS and PowerPC, args and also return values are expected to be sign-extended to fill a register, I think. But not on x86 where narrow operand-sizes are available. (x86-64 SysV has an unofficial extension relied on by clang where [narrow *args* are expected to be extended to 32-bit](https://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for/36760539#36760539)) – Peter Cordes Oct 07 '21 at 20:53
Not zeroing the upper part of EAX is a bad sign for efficiency, presumably meaning it's doing a `mov` to AX [with a false dependency on the old value of EAX](https://stackoverflow.com/questions/41573502/why-doesnt-gcc-use-partial-registers), instead of `movzx`. Yet another reason to just `rol ax, 16` yourself instead of copying an arg to the stack for a stdcall function! – Peter Cordes Oct 07 '21 at 20:55

In assembly, how to handle Windows API with a return value that's documented as less than native register width?

1 Answers1