27

I would like to ask a few questions about Assembly language. My understanding is that it's very close to machine language, making it faster and more efficient.

Since we have different computer architectures that exist, does that mean I have to write different code in Assembly for different architectures? If so, why isn't Assembly, write once - run everywhere type of language? Wouldn't be easier to simply make it universal, so that you write it only once and can run it on virtually any machine with different configurations? (I think that it would be impossible, but I would like to have some concrete, in-depth answers)

Some people might say C is the language I'm looking for. I haven't used C before but I think it's still a high-level language, although probably faster than Java, for example. I might be wrong here.

Raphael
  • 73,212
  • 30
  • 182
  • 400
nTuply
  • 469
  • 1
  • 5
  • 9

12 Answers12

48

Assembly language is a way to write instructions for the computer's instruction set, in a way that's slightly more understandable to human programmers.

Different architectures have different instruction sets: the set of allowed instructions is different on each architecture. Therefore, you can't hope to have a write-once-run-everywhere assembly program. For instance, the set of instructions supported by x86 processors looks very different from the set of instructions supported by ARM processors. If you wrote an assembly program for an x86 processor, it'd have lots of instructions that are not supported on the ARM processor, and vice versa.

The core reason to use assembly language is that it allows very low-level control over your program, and to take advantage of all of the instructions of the processor: by customizing the program to take advantage of features that are unique to the particular processor it will run on, sometimes you can speed up the program. The write-once-run-everywhere philosophy is fundamentally at odds with that.

svick
  • 1,876
  • 14
  • 15
D.W.
  • 167,959
  • 22
  • 232
  • 500
14

The DEFINITION of assembly language is that it is a language that can be translated directly to machine code. Each operation code in assembly language translates to exactly one operation on the target computer. (Well, it's a little more complicated than that: some assemblers automatically determine an "addressing mode" based on arguments to an op-code. But still, the principle is that one line of assembly translates to one machine-language instruction.)

You could, no doubt, invent a language that would look like assembly language but would be translated to different machine codes on different computers. But by definition, that wouldn't be assembly language. It would be a higher-level language that resembles assembly language.

Your question is a little like asking, "Is it possible to make a boat that doesn't float or have any other way to travel across water, but has wheels and a motor and can travel on land?" The answer would be that by definition, such a vehicle would not be a boat. It sounds more like a car.

Jay
  • 269
  • 1
  • 3
9

There is no conceptual (I daresay, no computer science) reason against having one assembly language for all computers in the world. In fact, that would make many things much easier. As far as theory is concerned, they are all the same, anyway, up to some funky bijection.

In practice, however, there are different chips for different purposes, with different operations and design principles (e.g. RISC vs CISC) that serve different goals, and the instruction sets that operate them and therewith the assembly languages differ. In the end, the answer is the same as when asking why there are so many different programming languages: different goals, different design decisions.

That said, you can of course introduce levels of abstraction to get to some shared interface. x86, for instance, has been done away with on chip level for quite some time; there's a little piece of hardware that translates x86 instructions to whatever your processor really works with. Languages like C would be another step away from the hardware (if an arguably tiny one), all the way up to languages like Haskell, Java or Ruby. Yes, compiler are one of the main achievements of computer science because they make it possible to separate concerns in this fashion.

Raphael
  • 73,212
  • 30
  • 182
  • 400
7

You mention the phrase "write once run anywhere" without seeming to notice its significance. That is the marketing slogan for Sun Microsystems that commercially invented the concept of a "virtual machine" and "bytecodes" for Java, although possibly the idea may have originated in academia 1st. The idea was later copied by Microsoft for .Net after they were successfully sued by Sun for breach of infringement of Java licensing. Java bytecodes are an implementation of the idea of cross-machine assembly or machine language. They are used for several other languages than Java and can theoretically be used to compile any language. After many years of very advanced optimization, Java comes close in performance to compiled languages showing the goal of high performance platform-agnostic virtual machine technology is achievable in general.

Another new idea in early stages/ circulating related to your requirements is called the recomputation project and is for scientific research although could be used for other purposes. The idea is to make computational experiments replicable via virtual machine technology. This is mainly the idea of simulating different machine architectures on arbitrary hardware.

peterh
  • 468
  • 4
  • 19
vzn
  • 11,162
  • 1
  • 28
  • 52
5

High level reasons

When you think about it, a microprocessor does an amazing thing: it lets you take a machine (such as a washing machine or an elevator), and replace a whole chunk of custom-designed mechanisms or circuits with a cheap, mass-produced silicon chip. You save a lot of money on parts, and a lot of time on design.

But hang on, a standard chip, replacing countless custom designs? There can't be a single, perfect microprocessor that is perfect for every application. Some applications need to minimise power usage but don't need to be fast; others need to be fast but don't need to be easy to program, others need to be low cost, etc.

So, we have many different "flavours" of microprocessor, each with its own strengths and weaknesses. It is desirable for them to all use a compatible instruction set, because this allows code reuse and makes it easier to find people with the right skills. However, the instruction set does affect the cost, complexity, speed, ease-of-use, and physical constraints of the processor, and so we have a compromise: there a few "mainstream" instruction sets (and many minor ones), and within each instruction set there are many processors with different characteristics.

Oh, and as technology changes, all these trade-offs change, so instruction sets evolve, new ones emerge, and old ones die. Even if there was a "best" instruction set of today, it might not be in 20 years.

Hardware details

Probably the biggest design decision in an instruction set is the word size, i.e. how large a number the processor can "naturally" manipulate. 8-bit processors deal with numbers from 0-255, whereas 32-bit processors deal with numbers from 0 to 4,294,967,295. Code designed for one needs to be completely rethought for another.

It is not just a matter of translating instructions from one instruction set to another. A completely different approach may be preferable in a different instruction set. For example, on an 8-bit processor a lookup table may be ideal, while on a 32-bit processor an arithmetic operation would be better for the same purpose.

There are other major differences between instruction sets. Most instructions fall into four categories:

  • Computation (Arithmetic and logic)
  • Control Flow
  • Data Transfer
  • Processor configuration

Processors differ in what sort of computations they can perform, as well as how they approach control flow, data transfer, and processor configuration.

For example, some AVR processors can neither multiply nor divide; whereas all x86 processors can. As you may imagine, eliminating the circuitry required for tasks like multiplication and division can make a processor simpler and cheaper; these operations can still be performed using software routines if they are needed.

x86 allows arithmetic instructions to load their operands from memory and/or save their results to memory; ARM is a load-store architecture and thus only has a few dedicated instructions for accessing memory. Meanwhile x86 has dedicated conditional-branch instructions, while ARM allows practically all instructions to be conditionally executed. Also, ARM allows bit-shifts to be performed as part of most arithmetic instructions. These differences lead to different performance characteristics, differences in internal design and cost of the chips, and differences in programming techniques at the assembly language level.

Conclusion

The reason it is impossible to have a universal assembly language is that, to properly convert assembly code from one instruction set to another, one must design the code all over again—something computers cannot yet do.

Artelius
  • 686
  • 3
  • 4
4

Adding to marvelous answer by D.W.: if you would like to have one assembler it would need to maintain all architectures, perfect translator among them and fully understand what are you doing.
Some heavily optimized codes per one architecture would need to be deoptimized, understood at more abstract level and optimized per another.
But if this was possible we would have perfect C compiler, and writting in pure assembly would not be beneficial at all.
The main point of using assembler is performance, which cannot be squeezed from recent compilers.
Writing such program would be even harder than existing compilers and maintaining all new architectures that are being created would make it even harder.
And for "one only" program, it would also mean full backwards compatibility.

Evil
  • 9,525
  • 11
  • 32
  • 53
3

As noted, LLVM is the closest thing to this so far. A big barrier to a really universal language is going to be fundamental differences related to implicit tradeoffs: concurrency, memory use, throughput, latency, and power consumption. If you write in an explicitly SIMD style, you could be using too much memory. If you write in an explicitly SISD style, you would get suboptimal parallelization. If you optimize for throughput, you hurt latency. If you maximize single-threaded throughput (ie: clock speed), you hurt battery life.

At the very least, the code would need to be annotated with the trade-offs. What may be most important is for the language to have good algebraic/type properties that give the compiler a lot of wiggle room to optimize and detect logical inconsistency.

Then there is the question of undefined behavior. Much of the speed of C and assembly languages come from undefined behavior. If you admit undefined behavior that actually does happen, then you end up handling them as special cases (ie: architecture and context specific hacks).

Rob
  • 490
  • 2
  • 9
1

I'm going to go against every other answer posted so far; they were reasonable at the time, but are now somewhat obsolete. I posit that true answer is, in fact:

It's totally possible, and some very big companies went ahead and did exactly that.

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

In other words, WebAssembly is a bytecode format which is designed to be easy for C++-adjacent languages to compile into, while also being easy to run safely on your browser.

Now, one might argue that WebAssembly isn't an "universal assembly language". First, it's not exactly assembly, in the sense that no processor can run it as-is; its instructions have to be compiled to machine code first, usually with additional overhead in the form of safety checks. Second, it's not inherently universal, in that there are lots of architectures it can't be compiled to (GPUs and FPGAs come to mind).

In that regard, the other answers are still relevant: WebAssembly makes a bunch of trade-offs, sacrificing some performance for safety and portability. Writing WebAssembly and compiling it to regular assembly is always going to produce a less efficient program than directly writing assembly.

I'd still argue that it's pretty damn close on both criteria:

  • It's assembly-like, in that it's a language with a lot of information baked-in. It doesn't care about type information, generics, casting, virtual dispatch, etc. It can only read from a write to a contiguous range of data called linear memory; and since it doesn't know how that memory is structured, it only needs to worry about preventing out-of-bounds accesses; structuring reads and writes is the responsibility of the code being compiled into wasm, much like with regular machine code. Thus, compiling from WebAssembly is a lot cheaper and faster than compiling from C++, which is why your browser can afford to ship with a wasm compiler.

  • It's universal-like, in that it can be made to run on any machine that we'd expect C++ (and every other generalist programming language on the market) to run on. It uses very simple primitives, that match pretty closely to the physical architecture of most CPUs (RAM, registers, branching, etc).

There are other arguments for wasm adoptions (it's open-source, unlike Java bytecode, it's widely-supported, etc), but the two points above are the most relevant to your question.

If what you're looking for is an extremely-low level code format, that can be compiled from most languages and run on most machines, then you want WebAssembly.

0

A Universal Assembly Language? Of course. By default, you just endow the language with a giant switch table; if (x86) then ... if (68x00) then ... if (AMD) then ... and maybe factor out the stuff that's in common between CPUs, and provide a facility for plug-ins to add new CPUs.

Essentially, most or all of that is what GAS already does.

You can make a Universal Anything this way; even a Universal Universal (a universal language for universal languages).

The question can be better posed as this: are there universal frameworks for assembly languages; in particular, general syntaxes that can be applied across the board? There are, in fact, 3 in common use, which could each easily be adapted all CPUs - the Intel Syntax (which uses "DB"'s, "DS"'s, ";" to head line comments, ":" after labels), the Motorola Syntax (which uses "FCB"'s, "*" to head line comments, strict formatting for labels and no ":") and the VAX Syntax (which has relative labels, e.g. "1f", "1b" and prefixes directives by dots), the last of which is what GAS uses.

We devised the CAS syntax, which has not (yet) been ported to a wide range of CPUs. It's closer to M4 in terms of directives (and may even be redesigned as a dialect of M4), uses C syntax for assembly-time expressions and operations and is free form (e.g. multiple statements per line permitted). Comments are headed by ";;" or "//" or may be in "/* ... */" form, and it has the "1f" and "1b" relative labels of the VAX syntax).

On the other question that you're asking: why can't CPUs all agree to do the same things or things cut from the same cloth? The answer is simple: it's a free market, they're not supposed to.

The differences between CPUs involve matters that are critical to their design and intended uses! The 8051/2, for instance, has large numbers of bit-sized and byte-sized registers and register windows - very convenient for multi-threaded applications and run-time systems. This is something also common with RISC processors. The CPUs intended for desktops or laptops (like the x86 family) have narrow register sets (and a corresponding bottleneck). The addressing modes are different, some have the equivalent of C's *a++ = b or *a-- = b", for instance. Processors in the x86 family from the 80386 onward have a large infrastructure for handling virtual memory, caching, paging, for distinguishing between different levels of protection, while the 8086 and 80186 do not. The interrupt-handling for different CPUs is different. Some have fixed locations for interrupt handler functions, others have fixed locations only for the function pointers, while others have virtual or paged locations. The level of support by the CPU for "location-independent" programs varies significantly.

Even the instruction sets can have significant differences. A processor with bit-sized registers, for instance, doesn't need a zillion different conditional branch instructions, only a few. Those without bit-sized registers may, instead, have a version of the "compare" (CMP) and "test" (TST) operations, along with a large number of conditional branch instructions.

The destinations for goto's and jumps are addressed in different ways. Some use absolute addresses, some use paged addresses or virtual addresses.

If you're going to try and smoothe over all those differences, then you're mission-creeping assemblers into being compilers, no matter how you try to frame it. That's the fallacious route that so-called "high level assemblers" went down. If you're going to go that way, then you might as well just use a compiler equipped with the ability to inject in-line assembly and (ideally, especially for embedded systems) the ability to inject your own custom run-time system/kernel in place of the compiler's default cookie-cutter run-time system/kernel.

NinjaDarth
  • 389
  • 1
  • 3
0

I'm going to be a bit contrary here and say yes, actually, it is possible. Especially so if we restrict ourselves to digital computing based on 8 bit bytes.

Pragmatic, practical, feasible? These are much different and your mileage may vary. However it is certainly possible to

  • Enumerate all machine codes up to a given bit-length
  • Create a constant map from all assembly language to all ISA
  • Create a superset of all overlapping operations
  • Extend with unique entries for remaining operations
  • Label the set of all possible codes with the extended superset operations
  • Use maps to translate from universal codes to device specific codes at runtime initialization or firmware flash
  • Write programs

But good luck with that, friends.

0

Perhaps what you are looking for is a Universal Turning Machine notation where everyone agrees on the symbols for the commands. (https://en.wikipedia.org/wiki/Universal_Turing_machine)

An 'assembler' that translates a Turning Acceptable language to the underlying vendor specific machine code and be build for any of those things we call computers.

In The Art of Computer Programming the there is a example of what this might look like.

But consider the question "why isn't their a commercially available universal language that can be used with all computers" I'd suggest the most dominiant influances are (1) convenience, not all assembly languages are the most convienent to use; (2) economics, providing, incompatibility between machines of different brands and vendors is a business strategy as well as the result of limited resources (time/money) to design machines.

Chris
  • 121
  • 2
0

assumption: compiling and optimising a high-level language L1 to a lower-level language L0 is easier than compiling and optimising a high-level language L2 (higher than L1) to L0; easier in the sense that you supposedly can generate more optimised code when compiling L1 to L0 than L2 to L0.

I think the assumption is probably correct, that is why probably most compilers use a low-level intermediate language (IR/LLVM).

if this is true than use any low-level language L0 and write compilers to translate L0 to other low-level languages. For example, use MIPS instruction set, and compile it to x86, arm, power, ...

-Taoufik