5

I'd like to have a bit more understanding of how, on a circuitry/hardware level, an assembler program works.

I think I have a very broad-brush understanding of how a CPU would process machine code on a hardware level. Please bear with me for this very generalised, hypothetical example:

If you took 00101110 in machine code, with the first part 0010 as an opcode and the second part 1110 as location address...

I think I understand, broadly, how those 8 bits of data would be fed along 8 wires to an instruction register, and how from there, the opcode 0010 gets fed along 4 wires into a variety of checking circuits to check the opcode, and a checking circuit would output true if if the opcode corresponded to the configuration of that circuit. Like this (yes I've been watching crash course computer science):

enter image description here

And i think I understand how, in broad terms, the location address 0111 would be sent along 4 wires that feed into multiplexors attached to four latch matrices, causing address location 0111 to be accessed in each of those matrices, each of which then feeds back whether its data bit at the location was a 1 or 0 / on or off.

What I'm saying is that I think I can begin to see, or at least imagine, how a processor 'processes' a binary number, on the level of hardware/circuitry, without it seeming like magic.

My question is, can someone explain to me on this level how a CPU, as part of an assembler, would translate assembly code?

For example, how would the circuitry take MOV EAX [EBX] and act on that as an instruction? I know that it would parse it, etc., but HOW does it parse it, on the level of wiring? Like how does it take a 'MOV' and translate that into the correct configuration of on/off wires?

On a related note, obviously the 'MOV' isn't stored as 'MOV' in the computer's memory - it's stored in binary. So if it's already stored in binary, why do we need to bother to translate it to a different binary configuration using an assembler?

Major
  • 51
  • 1
  • 3

5 Answers5

4

An assembler is a program that reads assembly language commands and translates then into a sequence of binary instructions, addresses and data values that is called machine code. The machine code is stored in the computer's memory and can be executed by the computer at some later time. Machine code is read and "understood" directly by the CPU.

So a command such as "load the value 10 into register A" might be written in assembly language as "LDA 10" and then stored in machine code as one byte 00101010. The first four bits of the machine code instruction 0010 represent the LDA instruction and the second four bits 1010 represent the value that is to be loaded.

Note that the assembler makes life easier for the programmer by translating the "LDA" instruction and translating the value 10 from decimal to binary. It will also do other stuff like allowing the programmer to use labels, which it then translates into specific memory addresses.

In the simplest CPU architecture, when the CPU executes the instruction 00101010 it actually runs a sequence of low level microinstructions which will be something like this:

  1. Add 1 to the register that tells the CPU where the next instruction is stored in memory
  2. Set a control line to take control of the data bus.
  3. Load the lowest four bits of the machine code instruction onto the data bus.
  4. Release control of the data bus.
  5. Set a control line to tell Register A to read and store the value on the data bus.
  6. Read the next instruction from memory.

In the very simplest/oldest CPU architecture this final translation from machine code to microinstructions is hard wired in logic gates.

A good guide to this sort of stuff for beginners is "But How Do It Know".

Bonus question: If an assembler is a program that creates other executable programs, how is the assembler created in the first place ?

gandalf61
  • 1,597
  • 6
  • 11
3

The CPU only understands machine code. Assembly language has to be compiled to machine code in order for the CPU to execute it.

Machine code isn't very user friendly. While the very first computers might have been programmed directly in machine code, this has no longer been the case for decades. First, assembly language was invented, and programs, called assemblers, were written that converted assembly language into machine code. The next innovation was higher-level programming languages, that were converted into machine code by a more sophisticated type of program called a compiler. Nowadays we have additional levels of sophistication, like bytecode and virtual machines.

Throughout all this historical development, one thing stayed the same — CPUs only understand machine code, which a compact way of encoding instructions. Modern CPUs typically understand some dialect of an instruction set architecture like x86. There are only a few common instruction set architectures nowadays, and this means that there are many different CPUs which accept, roughly speaking, the same machine code.

The same instruction set architecture could support several assembly languages. For example, there are two different conventions for x86 assembly: AT&T and NASM. Under one convention (AT&T), mov %eax, %ebx means move the contents of the eax register to the ebx register; under the other (Intel), mov eax, ebx implies a move in the opposite direction. The CPU is completely oblivious to such differences, since all it gets to see eventually is machine code. The machine code produced by AT&T mov %eax, %ebx and Intel mov ebx, eax is identical: 89C3 in hex, or 1000100111000011 in binary.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
2

That translation isn’t done by the CPU when it executed the instructions. It is done a lot earlier, when a program called “assembler” translates the assembler instructions into sequences of bits that the CPU can execute.

You say “it’s stored in binary”. Yes, it is translated from assembler to binary, and the binary code is stored.

How does the assembler do it? Write a C program that reads an integer, say the letters “1”, “2”, “3”, “4”, and writes the binary number 1234. An assembler is just the same, only more complicated.

gnasher729
  • 32,238
  • 36
  • 56
0

I couldn't find any answer to the question online but i think i figure it out.

1- when you type on your keyboard any alphabetical letters, you actually sending a binary code NOT the letter itself. So...

2- the alu of the computer it is a hardware circuits which takes the instructions it is based on and execute them. Here the ALU will NOT understand the MOV alphabetically

3- so Basically what developers did on early stage of making computer languages is making an assembler. Which is like a software and what it does is simple actually.

So what they did is writing a code(software) using binary already stored in the computer memory which takes the the MOV and translate it to binary.

It is something similar to this :

When You type on your keyboard: MOV Command using assembly language Before it goes to the ALU it will go to the software-harward which does this through circuits

Mov Command In binary when entering it by the keyboard M 01001101 O 01001111 V 01010110

Notice here : using assembly language MOV Command doesn't mean it is equal MOV String which has different value.

These 3 binaries in consecutively will send the The MOV Command to the main as 4 digits ALU

In general I think these 2 stages all done in the processor and how it is built in.

To confirm my theory you will find some processor not using same commands . And can't understand probably assembly language, since they built on another set of instruments entirely......

Or that's how I think about it I'm not sure

0

The assembler translates human-readable instructions to machine-language code, which are mere binary numbers. The codes are made of an opcode (identification number of a specific operation) followed by arguments (register number, memory address, immediate value).

The processor fetches the opcodes one after another and has logic to recognize them and trigger a sequencer that will know where and when to fetch the next parts of the instruction, then execute them by fetching data from the appropriate source, directing it to the relevant section of the ALU and moving the results where appropriate.