How does the data register actually work in Assembly (linux, 32-bit, nasm)?

Question

I've been getting into reverse engineering lately and wanted to learn more about ASM. As such I looked up some tutorials, but have found confusing information regarding the dx register.

I've written a small "Hello World" example here:

section .text
    global _start

_start:
    mov edx, mes_len ; Store where the bytes end for the data in the data register
    mov ecx, message ; Store the reference to the message that we're going to write in the counter register
    mov ebx, 1 ; Tell tha base register that we want to push to the stdout
    mov eax, 4 ; Move 4 into the Accumulator which is the equivalent of calling sys write on linux
    int 0x80 ; int means interrupt, 0x80 means interrupt the kernel to make our system calls

    mov eax, 1 ; Move 1 into accumulator, which is sys exit on linux
    mov ebx, 0 ; exit code 0
    int 0x80

section .data
    message: db "Hello, World", 10, 0 ; 10 is the newline character and 0 is the null character
    mes_len: equ $ - message ; "$" means current position, then we subtract that from the length of our message

and the line

mov edx, mes_len ; Store where the bytes end for the data in the data register

doesn't make sense to me. According to this source, edx is used alongisde the Accumulator for complex calculations like division or multiplication, but also for I/O. It then states that the data register holds the address of the message, but doesn't the accumulator also hold the address of message?

Any help would be appreciated.

The GPRs do not have one single use each. The Linux `int 80h` interface for the `write` function just so happens to expect the address in `ecx` and the message **length** (not address!) in `edx`. If you wrote your own operating system you could decide what registers to use for a function call like this. I don't know where you got "the data register holds the address of the message" from, that's wrong. The accumulator (`eax`) also doesn't hold the address, it holds the function number for the `int 80h` system call. — ecm, Apr 07 '23 at 11:10
`message` shouldn't contain a zero byte (`0`) for writing a buffer's data to stdout. And "then we subtract that from the length of our message" is also wrong, we subtract the message's **address** from `$` (current offset after the message's end) to **get the message length**. — ecm, Apr 07 '23 at 11:12
Related: [Hello, world in assembly language with Linux system calls?](https://stackoverflow.com/q/61519222) for correct detailed explanations of all parts of this, unlike the sometimes-misleading comments in your version. Also [What is the explanation of this x86 Hello World using 32-bit int 0x80 Linux system calls from \_start?](https://stackoverflow.com/q/45052162) — Peter Cordes, Apr 07 '23 at 21:06
Related: [Why are x86 registers named the way they are?](https://stackoverflow.com/questions/892928/why-are-x86-registers-named-the-way-they-are) — Nate Eldredge, Apr 08 '23 at 04:28

Mike Nakis · Accepted Answer · 2023-04-08T09:16:39.773

2

Intel registers are general-purpose registers. Due to historical/legacy reasons, some of them are associated with particular roles, but as intel CPUs have been evolving through the decades, this has been less and less true. Certain special uses still remain, for example using CX/ECX in the role of a loop counter, using DX/EDX as an extension of the accumulator, etc.

Interrupts are a whole different thing: they are essentially function calls. So, whoever implements an interrupt (in your case, the kernel) is free to define what each register should contain when the interrupt gets invoked, and what each register will contain when the interrupt returns. So, they can assign roles to registers as they please, and these roles do not necessarily bear any relation to the historical roles of the registers.

In the text that you linked to I was unable to find any reference of either EAX or EDX being used to contain an address. There is probably no CPU instruction which requires EDX to contain a memory address, (ESI and EDI usually fulfill such roles,) but an interrupt is absolutely free to require EDX to contain a memory address.

edited Apr 08 '23 at 09:16

answered Apr 07 '23 at 11:39

Mike Nakis

56,297
11
110
142

2

`in`/`out` (and [`ins`](https://www.felixcloutier.com/x86/ins:insb:insw:insd)/`outs`) instructions use DX to hold the port number, an "address" in I/O space, not memory. It's a different address-space, so you wouldn't normally just say "address", and certainly not "address of the message". But that doesn't explain why the OP also thinks the accumulator would also hold the address of the message, because that's not the case for any of those instructions. `in`/`out` use AL/AX/EAX to hold values, not addresses. – Peter Cordes Apr 07 '23 at 21:04
@PeterCordes you are right, I replaced "address" with "memory address". – Mike Nakis Apr 08 '23 at 09:16
I meant that more as a correction to the question's terminology (and a guess at what they may have read, maybe without fully understanding), but yeah, good edit, that makes it clearer what you're saying. – Peter Cordes Apr 08 '23 at 09:24
You were correct to correct my terminology. I think your explanation and other links provided ([Particularly this one](https://stackoverflow.com/questions/61519222/hello-world-in-assembly-language-with-linux-system-calls)) has provided more than enough for me to actually understand this now. I find it finicky that there are so many different ways to do things depending on architecture/OS, which makes sense, it's difficult to learn due to the lack of a defined standard (hence my confusion about the registers being used differently to how they were defined in some texts). Regardless, thank you! – Yelnat Apr 08 '23 at 18:11

How does the data register actually work in Assembly (linux, 32-bit, nasm)?

1 Answers1