4

I'm stepping into the world of Assembly Language Programming. I'm trying to understand everything found on: https://www.tutorialspoint.com/assembly_programming

I came across the code below:

section .text
     global _start      ;must be declared for using gcc
_start: ;tell linker entry point

;This part works fine.
;mov    edx, len    ;message length
;mov    ecx, msg    ;message to write

;This does not work because I interchanged edx and ecx.
mov ecx, len    ;message length
mov edx, msg    ;message to write

mov ebx, 1      ;file descriptor (stdout)
mov eax, 4      ;system call number (sys_write)
int 0x80        ;call kernel
mov eax, 1      ;system call number (sys_exit)
int 0x80        ;call kernel

section .data

msg db  'Hello, Kaunda!',0xa    ;our dear string
len equ $ - msg         ;length of our dear string

Can I choose to put the variable 'len' or 'msg' in any of the data registers (EAX, EBX, ECX and EDX).

On order words:

WHY is the content of variable len transferfed into EDX register and not ECX or any other register? Is there a clear guideline to know which variable goes into which register?

I've read about about the functions of each of the registers EAX, EBX, ECX and EDX but I'm still not clear. Their functions looks similar to me.

Update: I've running the code from https://www.tutorialspoint.com/compile_assembly_online.php

I think that is Linux environment

phuclv
  • 37,963
  • 15
  • 156
  • 475
Kaunda
  • 67
  • 1
  • 7
  • 5
    This is less about assembly language per se than the system call ABI. The kernel looks for the arguments of the system call in particular registers, because that's just how it works. It obviously has to have a fixed correlation between arguments and registers because it no other means of knowing which one is which. So, you have to indicate what OS you're coding for in order to look up the particular syscall ABI it uses. – Ken Thomases Nov 17 '18 at 21:28
  • 1
    “Interchangeable”? Well, at a single instruction level - yes. BUT take a look at what the kernel call expects to have in registers when it is called. – DisappointedByUnaccountableMod Nov 17 '18 at 21:29
  • @barny what does "single instruction level" means? – Kaunda Nov 17 '18 at 21:52
  • 1
    @Kaunda: he means that instructions like `imul eax, ecx` and `imul edx, ebx` both do the same thing (to different regs), and the CPU doesn't care if you keep a loop counter in EBX or EDX. So for the most part register allocation is a free choice within a function. But x86 definitely has special-purpose uses for each register. e.g. variable-count shifts only work with the count in `cl`, unless you have BMI2 `shrx` / `shlx`. Anyway, the main reason for choosing one register over another is calling-convention reasons - an agreement between caller and callee about which argument will be where. – Peter Cordes Nov 17 '18 at 22:38
  • E.g. moving a value into a register - the things you changed. BUT the kernel call API is fixed and if you don’t put values in the expected registers or expected order on the stack then it won’t work. – DisappointedByUnaccountableMod Nov 18 '18 at 10:27
  • 1
    Got it!! I appreciate all the comments – Kaunda Nov 18 '18 at 18:36

1 Answers1

13

When you issue an int 0x80, your program is interrupted and the kernel inspects the state of the registers. From eax it grabs the number of the system call you want to execute and from the other registers it grabs additional data. For example, for the write system call, it grabs the file descriptor from ebx, a pointer to the buffer you want to write from ecx and the number of bytes you want to write from edx. The kernel does not know what your intentions are, it just stupidly grabs whatever is in the registers, so it does matters which registers you use.

However, in general, it does not matter what registers you use for what values. In your own code, you are free to use almost all registers (except for such as registers as esp) for whatever purpose you want as long as you don't interact with other peoples code.

The only places where it matters which registers are used is when you want to interact with code written by other people, such as when calling functions or the operating system or when writing functions that are going to be called by others. In such cases, you have to set the relevant registers to the expected values or possibly preserve their contents.

For example, when you write a function called by other people's code, it is expected that you return the result of your function in eax and preserve the contents of the registers ebx, esi, edi, esp, and ebp. If you use these registers for your own purposes, you have to first save their values somewhere (e.g. on the stack) and restore them to their original values before returning.

There are also some instructions that expect there operands to be in certain registers (such as stos or idiv), but for most instructions you are free to choose whatever registers you want.

In the cases where it matters, the rules which registers are used for what purpose are written down in an Application Binary Interface (ABI) document. This document can be understood as an agreement between all programmers as to which data to expect in which registers when calling functions or the operating system. Strict adherence to the ABI is necessary to make your code work correctly when calling/called by other people's code.

On i386, the architecture you are currently programming for, Linux uses the i386 SysV ABI.. Generally, each operating system uses a different ABI for each architecture, so before writing code for a new operating system or architecture be sure to check out the relevant ABI.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • Interesting! Very educative. Can you please explain further why the code above didn't produce the "Hello, World" when I transfered the content of variable len into ecx register and variable msg into edx register. It didn't produce any error message too. The reverse works (Thus; when I put len in edx and msg in ecx, the "Hello, World" displays) I'm expecting it to work since I have control over what goes into which register – Kaunda Nov 17 '18 at 22:47
  • 1
    The reason there is no error message is that the code doesn't check for an error and print a message. The write system call did return an error (EFAULT, most likely). After `int 0x80` it should check for eax < 0. (Note, it returns EFAULT because the value in ecx is not a valid pointer.) – prl Nov 17 '18 at 23:58
  • 2
    @Kaunda Well, when you put the length in `ecx` and the pointer to the message into `edx`, the kernel still thinks that `ecx` contains the pointer to the message and `edx` contains the length of the message. A length interpreted as a pointer points nowhere useful and a pointer interpreted as a number is usually a very large number, so the kernel returns the error `EFAULT` meaning “invalid address” as @prl already explained. It's your job to turn the kernel's error codes into error messages. The kernel itself is rarely concerned with this (except under rare circumstances). – fuz Nov 18 '18 at 00:18
  • 1
    So to sum it up: you might have control over what goes into which register, but the kernel has no way to know what you meant. It simply assumes that you stick to the conventions laid out in the ABI. If you don't follow them, weird things happen. – fuz Nov 18 '18 at 00:20