4

I want to write a bootloader, which simply prints "Hello World!" on the screen and I don't know why my bytes get mixed up. I'm trying to write it in AT&T syntax (please don't recommend Intel syntax) and trying to convert the code from this tutorial to AT&T syntax.

Now here is the rather short code for my bootloader:

start:
.code16         #real mode
.text
.org 0x0
.globl _main
_main:
    movw hello, %si
    movb $0x0e, %ah

loophere:
    lodsb
    or %al, %al     #is al==0 ?
    jz halt         #if previous instruction sets zero flag jump to halt
    int $0x10       #run bios interrupt 0x10 (ah is set to 0x0e so a character is displayed)
    jmp loophere


halt:
    cli
    hlt


hello:  .ascii "Hello world!\0"


filloop:    
    .fill (510-(.-_main)),1,0   #I hope this works. Fill bootloader with 0's until byte 510


end:
    .word 0xaa55

Now if I compile this with

$as -o boot.o boot.as
$ld -Ttext 0x07c00 -o boot.elf boot.o
$objcopy -O binary boot.elf boot.bin

the following command

$objdump -d boot.elf

gives me this dissassembly

Disassembly of section .text:

0000000000007c00 <_main>:
    7c00:   8b 36                   mov    (%rsi),%esi
    7c02:   11 7c b4 0e             adc    %edi,0xe(%rsp,%rsi,4)

0000000000007c06 <loophere>:
    7c06:   ac                      lods   %ds:(%rsi),%al
    7c07:   08 c0                   or     %al,%al
    7c09:   74 04                   je     7c0f <halt>
    7c0b:   cd 10                   int    $0x10
    7c0d:   eb f7                   jmp    7c06 <loophere>

0000000000007c0f <halt>:
    7c0f:   fa                      cli    
    7c10:   f4                      hlt    

0000000000007c11 <hello>:
    7c11:   48                      rex.W
    7c12:   65 6c                   gs insb (%dx),%es:(%rdi)
    7c14:   6c                      insb   (%dx),%es:(%rdi)
    7c15:   6f                      outsl  %ds:(%rsi),(%dx)
    7c16:   20 77 6f                and    %dh,0x6f(%rdi)
    7c19:   72 6c                   jb     7c87 <filloop+0x69>
    7c1b:   64 21 00                and    %eax,%fs:(%rax)

0000000000007c1e <filloop>:
    ...

0000000000007dfe <end>:
    7dfe:   55                      push   %rbp
    7dff:   aa                      stos   %al,%es:(%rdi)

if I hexdump it (you can also see the bytes in the disassembly above) my first 6 bytes are

8b 36
11 7c b4 0e

compared to be 10 7c b4 0e from the tutorial (The rest of the hexdump is exactly the same down to the byte). Now I understand that ac is the instruction for lodsb (loadstringbyte) so b4 0e would have to load 0e into %ah and be 10 7c would have to point %si to the hello label at address 7c10 (be aware of little endian). I changed the corresponding bytes with a hex editor and it suddenly worked. Allthough the disassembly kinda mixed it up like this:

0000000000007c00 <_main>:
    7c00:   be 10 7c b4 0e          mov    $0xeb47c10,%esi
    7c05:   ac                      lods   %ds:(%rsi),%al

My original version just printed a capital 'S'. Can someone help me as to why these first instruction bytes get set differently?

I'm coding all this on Debian 9 64-bit and running it on qemu-system-x86_64 as a floppy.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Keya Kersting
  • 135
  • 1
  • 1
  • 8
  • 4
    If you want objdump to dump out 16-bit encoding add the `-Mi8086` option – Michael Petch Oct 26 '17 at 13:22
  • 2
    I can sense multiple problems (but not going to validate all details, so just commenting out some hints). 1) I don't believe that disassembly is correct, you need 16 bit disassembler, while that disassembly treats the machine code as 64b mode would. 2) `movw hello, %si` load value from memory (i.e. the letters `"He"` are used as value and that is loaded into `si`). I guess adding `$` ahead of `hello` should fix that (or use `lea` instead of `mov`). 3) you should compensate for various BIOSes loading your code to different cs:ip, some `07C0:0000`, some `0000:7C00`, so start with far jmp. – Ped7g Oct 26 '17 at 13:26
  • Overall maybe you should start with few short DOS apps to practice x86-16 assembly, if you have some nice DOS debugger, I guess it's simpler+faster to start dosbox + turbo debugger than booting up whole emulated x86 machine in something like BOCHS. You can call the `int 10h` BIOS service in similar way like in bootloader, but the COM files under DOS are not completely as tricky as bootloaders. Also I believe AT&T syntax is more for machines than humans. I wouldn't dare to write anything in it, except bare minimum to boot something more usable (but there's no reason to use AT&T at all). – Ped7g Oct 26 '17 at 13:28
  • 1
    4) you should load ds. – prl Oct 26 '17 at 13:30
  • 1
    I have a number of bootloader tips in this [SO answer](https://stackoverflow.com/a/32705076/3857942) . Although the answer uses NASM intel syntax the tips themselves apply equally. – Michael Petch Oct 26 '17 at 13:32
  • Thanks it helped a bit. I still have a problem though, the 16bit instruction get interpreted as `8b 36 11 7c` which means `mov 0x7c11,%si`. The `be 10 7c` similarly means `mov $0x7c10,%si`, though as I looked up in the opcodes the address 7c10 gets interpreted as an immediate operand. So I guess there seems to be a problem with the compilation? – Keya Kersting Oct 26 '17 at 13:59
  • `movw hello, %si` moves the 16-bit word at memory location hello into _SI_. That's not what you want. You want to move the address of the label `hello` (effectively the pointer) into _SI_. To do that properly you need to place a `$` in front to mean that it is an immediate operand. `mov $hello, %si` would be correct. – Michael Petch Oct 26 '17 at 14:11
  • Ooohhh thank you Mr. (Mrs.?) Ped7g. The $ before the hello fixed it aswell as the lea instruction. The $-prefix lead to the exact same hexcode, whereas the lea instruction was not the same but it produced the same result. I may have a look at DOS but I probably only ever want to do a bootloader in 16bit because otherwise it's not widely used. Thank you very much for helping me and solving this problem. Have a nice day! – Keya Kersting Oct 26 '17 at 14:16
  • 2
    Also to test if register contains zero use rather `test %al,%al` than `or`. The `test` is directly designed to discard result and update only flags (in the same way as `and` would), while the `or` is trying to modify target register, so it's performance may be less optimal in certain scenarios (like the one you are using it for). The major difference between starting with DOS vs bootloader is availability of debuggers... and programming in assembly without debugger is very difficult, you don't want to do that, the descriptions of instructions will become much clearer when you try in debugger.. – Ped7g Oct 26 '17 at 16:55
  • 2
    @Ped7g ; Although the BOCHs debugger isn't symbolic it is very good at debugging real mode code and I do recommend it for general debugging of bootloader code. QEMU can be used for remote real mode bootloader debugging but it has some serious limitations (mainly due to the fact that it has no notion of real mode segment:offset addressing). If you code your bootloader within certain parameters QEMU can be useful and it will do symbolic debugging. – Michael Petch Oct 26 '17 at 18:03

1 Answers1

6

If you want to decode instructions as 16-bit then you need to tell OBJDUMP with the -Mi8086 option. Since you created a 64-bit object with AS and LD it decoded as 64-bit instructions by default. -M overrides that. i8086 is 16-bit instruction decoding.

Many of the problems in your code are related to not setting up the segment registers properly including DS. I discuss many of these issues in my Bootloader Tips. As well in AT&T syntax requires a $ in front of labels if you want their address (an immediate operand). movw hello, %si should be movw $hello, %si. Alternatively you can use LEA that takes a memory operand and just computes the address (but does't retrieve the data). In that case you don't use a $ sign. leaw hello, %si should also work.

When using INT 10h/AH=0Eh you should set BH which is the page number to display to. 0 is the visible page.

With all this in mind this code should work:

start:
.code16         #real mode
.text
.globl _main
_main:
    xor  %ax, %ax      # We  are usin offset 0x7c00, thus we need to se segment to 0x0000
    mov  %ax, %ds
    mov  %ax, %es
    mov  %ax, %ss      # Set the stack to grow down just below bootloader
    mov  $0x7c00, %sp
    cld                # Ensure forward movement of lods/movs/scas instructions

    movw $hello, %si   # We want the address of hello, not what it points at
    #leaw hello, %si   # Alternative way to get address with LEA instruction.
    movb $0x0e, %ah
    xor  %bh, %bh      # Make sure video page number is set (we want 0)

loophere:
    lodsb
    or %al, %al     #is al==0 ?
    jz halt         #if previous instruction sets zero flag jump to halt
    int $0x10       #run bios interrupt 0x10 (ah is set to 0x0e so a character is displayed)
    jmp loophere


halt:
    cli
    hlt


hello:  .ascii "Hello world!\0"


filloop:
    .fill (510-(.-_main)),1,0   #I hope this works. Fill bootloader with 0's until byte 510


end:
    .word 0xaa55
Michael Petch
  • 46,082
  • 8
  • 107
  • 198