How can I write a code that loops 256 times using only 3 instructions and one 8-bit register (8086 instruction set)?

Question

This was a question that was previously posed by a prof of mine and I'm assuming the 8-bit register is either CL or CH. I got it working by simply moving 01H to the CH register, but I was wondering if there was any other way of doing this since I am technically using the 16-bit CX register as a whole when running the code.

My code for reference :

MOV CH,01H
L1:INC AX    ;to keep count
LOOP L1

Peter Cordes · Accepted Answer · 2022-12-14T18:08:48.823

You're right, your code uses 16-bit CX. Much worse, it depends on CL being zero before this snippet executes! An 8-bit loop counter that starts at zero will wrap back to zero 256 decrements (or increments) later.

   mov  al, 0     ; uint8_t i = 0.   xor ax,ax is the same code size but zeros AH
loop_top:             ; do {
   dec  al
   jnz  loop_top  ; }while(--i != 0)

Nothing in the question said there needed to be any work inside the loop; this is just an empty delay loop.

Efficiency notes: dec ax is smaller than dec al, and loop rel8 is even more compact than dec/jnz. So if you were optimizing for real 8086 or 8088, you'd want to keep the loop body smaller because it runs more times than the code ahead of the loop. Of course if you actually wanted to just delay, this would delay longer since code-fetch would take more memory accesses. Overall code size is the same either way, for mov ax, 256 (3 bytes) vs. xor ax,ax (2 bytes) or mov al, 0 (2 bytes).

This works the same with any 8-bit register; AL isn't special for any of these instructions, so you'd often want to keep it free for stuff that can benefit from its special encodings for stuff like cmp al, imm8 in 2 bytes instead of the usual 3.

(mov al, 0 vs. xor al,al - false dependency either way on many modern CPUs. mov ah,0 might avoid a false dependency on Skylake; at least mov from another register does but maybe not immediate. See How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent. Anyway, xor-zeroing is generally not useful on byte registers.)

I tried doing something very similar but used the loop command instead so I was stuck having it looping 2^16 times instead, completely forgot about this way of looping, thank you :) — Chronos, Mar 08 '22 at 06:30
@Chronos: This is the normal way of looping; the `loop` instruction is just a code-size optimization ([at the expense of speed on most modern CPUs](https://stackoverflow.com/questions/35742570/why-is-the-loop-instruction-slow-couldnt-intel-have-implemented-it-efficiently)), and even then useful only in the special case where it's convenient to use RCX/ECX/CX, and when `do{}while(--cx == 0)` ends at a convenient CX value. It seems common for beginners to get stuck always using `loop`, even when they're already counting up a pointer or other counter in the same loop. — Peter Cordes, Mar 08 '22 at 09:24
Not every assembler will let you use the name "loop" for your loop label since it's already an instruction, just a reminder — puppydrum64, Dec 14 '22 at 17:55
@puppydrum64: Good point. YASM refuses it, NASM accepts it. IDK about emu8086. — Peter Cordes, Dec 14 '22 at 18:09

How can I write a code that loops 256 times using only 3 instructions and one 8-bit register (8086 instruction set)?

1 Answers1