How can a language whose compiler is written in C ever be faster than C?

Question

Taking a look at Julia's webpage, you can see some benchmarks of several languages across several algorithms (timings shown below). How can a language with a compiler originally written in C, outperform C code?

Figure: benchmark times relative to C (smaller is better, C performance = 1.0).

score 289 · Accepted Answer · edited May 23 '17 at 12:37

There is no necessary relation between the implementation of the compiler and the output of the compiler. You could write a compiler in a language like Python or Ruby, whose most common implementations are very slow, and that compiler could output highly optimized machine code capable of outperforming C. The compiler itself would take a long time to run, because its code is written in a slow language. (To be more precise, written in a language with a slow implementation. Languages aren't really inherently fast or slow, as Raphael points out in a comment. I expand on this idea below.) The compiled program would be as fast as its own implementation allowed—we could write a compiler in Python that generates the same machine code as a Fortran compiler, and our compiled programs would be as fast as Fortran, even though they would take a long time to compile.

It's a different story if we're talking about an interpreter. Interpreters have to be running while the program they're interpreting is running, so there is a connection between the language in which the interpreter is implemented and the performance of the interpreted code. It takes some clever runtime optimization to make an interpreted language which runs faster than the language in which the interpreter is implemented, and the final performance can depend on how amenable a piece of code is to this kind of optimization. Many languages, such as Java and C#, use runtimes with a hybrid model which combines some of the benefits of interpreters with some of the benefits of compilers.

As a concrete example, let's look more closely at Python. Python has several implementations. The most common is CPython, a bytecode interpreter written in C. There's also PyPy, which is written in a specialized dialect of Python called RPython, and which uses a hybrid compilation model somewhat like the JVM. PyPy is much faster than CPython in most benchmarks; it uses all sorts of amazing tricks to optimize the code at runtime. However, the Python language which PyPy runs is exactly the same Python language that CPython runs, barring a few differences which don't affect performance.

Suppose we wrote a compiler in the Python language for Fortran. Our compiler produces the same machine code as GFortran. Now we compile a Fortran program. We can run our compiler on top of CPython, or we can run it on PyPy, since it's written in Python and both of these implementations run the same Python language. What we'll find is that if we run our compiler on CPython, then run it on PyPy, then compile the same Fortran source with GFortran, we'll get exactly the same machine code all three times, so the compiled program will always run at around the same speed. However, the time it takes to produce that compiled program will be different. CPython will most likely take longer than PyPy, and PyPy will most likely take longer than GFortran, even though all of them will output the same machine code at the end.

From scanning the Julia website's benchmark table, it looks like none of the languages running on interpreters (Python, R, Matlab/Octave, Javascript) have any benchmarks where they beat C. This is generally consistent with what I'd expect to see, although I could imagine code written with Python's highly optimized Numpy library (written in C and Fortran) beating some possible C implementations of similar code. The languages which are equal to or better than C are being compiled (Fortran, Julia) or using a hybrid model with partial compilation (Java, and probably LuaJIT). PyPy also uses a hybrid model, so it's entirely possible that if we ran the same Python code on PyPy instead of CPython, we'd actually see it beat C on some benchmarks.

score 101 · Answer 2 · answered Aug 22 '15 at 14:35

How can a machine built by a man be stronger than a man? This is exactly the same question.

The answer is that the output of the compiler depends on the algorithms implemented by that compiler, not on the langauge used to implement it. You could write a really slow, inefficient compiler that produces very efficient code. There's nothing special about a compiler: it's just a program that takes some input and produces some output.

Raphael · Answer 3 · 2015-08-23T14:15:11.117

I want to make one point against a common assumption which is, in my opinion, fallacious to the point of being harmful when choosing tools for a job.

There is no such thing as a slow or fast language.¹

On our way to the CPU actually doing something, there are many steps².

At least one programmer with certain skillsets.
The (formal) language they program in ("source code").
The libraries they use.
Something that translates source code into machine code (compilers, interpreters).
The overall hardware architecture, e.g. number of processing units and layout of the memory hierarchy.
The operating system which manages the hardware.
On-CPU optimizations.

Every single item contributes to the actual runtime you can measure, sometimes heavily. Different "languages" focus on different things³.

Just to give some examples.

1 vs 2-4: an average C programmer is likely to produce far worse code than an average Java programmer, both in terms of correctness and efficiency. That is because the programmer has more responsibilities in C.
1/4 vs 7: in low-level language like C, you may be able to exploit certain CPU features as a programmer. In higher-level languages, only the compiler/interpreter can do so, and only if they know the target CPU.
1/4 vs 5: do you want or have to control the memory layout in order to best use the memory architecture at hand? Some languages give you control over that, some don't.
2/4 vs 3: Interpreted Python itself is horribly slow, but there are popular bindings to highly optimized, natively compiled libraries for scientific computing. So doing certain things in Python is fast in the end, if most of the work is done by these libraries.
2 vs 4: The standard Ruby interpreter is quite slow. JRuby, on the other hand, can be very fast. That is the same language is fast using another compiler/interpreter.
1/2 vs 4: Using compiler optimisations, simple code can be translated into very efficient machine code.

The bottom line is, the benchmark you found does not make much sense, at least not when boiled down to that table you include. Even if all you are interested in is running time, you need to specify the whole chain from programmer to CPU; swapping out any of the elements can change the results dramatically.

To be clear, this answers the question because it shows that the language the compiler (step 4) is written in is but one piece of the puzzle, and probably not relevant at all (see other answers).

There certainly are language features that are more costly to implement than others. But the existence of features does not mean you have to use them, and an expensive feature may save the use of many cheaper ones and thus pay of in the end. (Of have other advantages not measurable in running time.)
I skip over the algorithmic level because it does not always apply and is mostly independent of the programming language used. Keep in mind that different algorithms lend themselves better to different hardware, for instance.
I deliberately don't go into different success metrics here: running time efficiency, memory efficiency, developer time, security, safety, (provable?) correctness, tool support, platform independency, ...

Comparing languages w.r.t. one metric even though they have been designed for completely different goals is a huge fallacy.

Evil · Answer 4 · 2015-08-25T04:30:55.887

There is one forgotten thing about optimisation here.

There was longish debate about fortran outperforming C. Putting apart malformed debate: the same code was written in C and fortran (as testers thought) and performance was tested based on same data. The problem is, these languages differ, C allows pointers aliasing, while fortran does not.

So the codes were not same, there was no __restrict in C tested files, which gave differences, after rewriting files to tell compiler that it can optimise pointers, the runtimes become similar.

The point here is, that some optimisation techniques are easier (or starts to be legal) in newly created language.

Also it is possible in long run to VM with JIT outperform C. There are two possibilities:
JIT code can take advantage of machine that it hosts it (for example some SSE $X$ or other exclusive for some CPU vectorised instructions) that were not implemented in compared program.

Secondly VM can perform pressure test while running, so it can take pressured code and optimise it or even precalculate it during runtime. In advance compiled C program does not expect where is the pressure or (the most of the time) there are generic versions of executables for general family of machines.

In this test there is also JS, well there are faster VMs than V8, and it also performs faster than C in some tests.

I have checked it, and there were unique optimising techniques not available yet in C compilers.

C compiler would have to do static analysis of whole code at once, march upon given platform and go around memory alignment problems.

VM just transliterated part of the code to optimised assembly and run it.

About Julia - as I checked it operates on AST of code, for example GCC skipped this step an just recently started to take some info from there. This plus other constraints and VM techniques might explain a bit.

Example: let us take simple loop, that takes starting end ending point from variables and loads part of variables into calculations know at the runtime.

C compiler generates loading variables from registers.
But at the runtime these variables are known and treated as constants through execution.
So instead of loading variables from registers (and not performing caching because it can change, and from static analysis it is not clear) they are treated fully like constants and folded, propagated.

score 12 · Answer 5 · edited Apr 13 '17 at 12:48

The previous answers give pretty much the explanation, though mostly from a pragmatic angle, for as much as the question makes sense, as excellently explained by Raphael's answer.

Adding to this answer, we should note that, nowadays, C compilers are written in C. Of course, as noted by Raphael their output and its performance may depend, among other things, on the CPU it is running on. But it also depends on the amount of optimization done by the compiler. If you write in C a better optimizing compiler for C (which you then compile with the old one to be able to run it), you get a new compiler that makes C a faster language than it was before. So, what is the speed of C? Note that you can even compile the new compiler with itself, as a second pass, so that it compiles more efficiently, though still giving the same object code. And the full employment theorem shows that their is no end to such improvements (thanks to Raphael for the pointer).

But I think it may be worthwhile trying to formalize the issue, as it illustrate very well some fundamental concepts, and particularly denotational versus operational view of things.

What is a compiler?

A compiler $C_{S\to T}$, abbreviated to $C$ if there is no ambiguity, is a realization of a computable function $\mathcal C_{S\to T}$ that will translate a program text $P_{:S}$ computing a function $\mathcal P$, written in a source language $S$ into program text $P_{:T}$ written in a target language $T$, that is supposed to compute the same function $\mathcal P$.

From a semantic point of view, i.e. denotationally, it does not matter how this compiling function $\mathcal C_{S\to T}$ is computed, i.e., what realization $C_{S\to T}$ is chosen. It could even be done by a magic oracle. Mathematically, the function is simply a set of pairs $\{(P_{:S},P_{:T})\mid P_S\in S \wedge P_T\in T\}$.

The semantic compiling function $\mathcal C_{S\to T}$ is correct if both $P_S$ and $P_T$ compute the same function $\mathcal P$. But this formalization applies as well to an incorrect compiler. The only point is that whatever is implemented achieves the same result independently of the implementation means. What matters semantically is what is done by the compiler, not how (and how fast) it is done.

Actually getting $P_{:T}$ from $P_{:S}$ is an operational issue, that must be solved. This is why the compiling function $\mathcal C_{S\to T}$ must be a computable function. Then any language with Turing power, no matter how slow, is sure to be able to produce code as efficient as any other language, even if it may do so less efficiently.

Refining the argument, we probably want the compiler to have good efficiency, so that the translation can be performed in reasonable time. So the performance of the compiler program matters for users, but it has no impact on semantics. I am saying performance, because the theoretical complexity of some compilers can be much higher than one would expect.

About bootstrapping

This will illustrate the distinction, and show a practical application.

It is now common place to first implement a language $S$ with an interpreter $I_S$, and then write a compiler $C_{S\to T\,:S}$ in the language $S$ itself. This compiler $C_{S\to T\,:S}$ can be run with the interpreter $I_S$ to translate any program $P_{:S}$ into a program $P_{:T}$. So we do have a running compiler from language $S$ to (machine?) language $T$, but it is very slow, if only because it runs on top of an interpreter.

But you can use this compiling facility to compile the compiler $C_{S\to T\,:S}$, since it is written in language $S$, and thus you get a compiler $C_{S\to T\,:T}$ written in the target language $T$. If you assume, as often the case, that $T$ is a language that is more efficiently interpreted (machine native, for example), then you get a faster version of your compiler running directly in language $T$. It does exactly the same job (i.e. produces the same target programs), but it does it more efficiently.

score 6 · Answer 6 · answered Aug 24 '15 at 21:13

By Blum's speedup theorem there are programs which written and run on the very fastest computer/compiler combination will run slower than a program for the same on your first PC running interpreted BASIC. There just isn't a "fastest language". All you can say is that if you write the same algorithm in several language (implementations; as noted, there are plenty of different C compilers around, and I even came across a rather capable C interpreter), it will run faster or slower in each.

There can't be a "always slower" hierarchy. This is a phenomenon everybody fluent in several languages is aware of: Each programming language was designed for a specific type of applications, and the more used implementations have been lovingly optimized for that type of programs. I'm pretty sure that e.g. a program for fooling around with strings written in Perl will probably beat the same algorithm written in C, while a program munching on large arrays of integers in C will be faster than Perl.

Randall Krieg · Answer 7 · 2015-08-27T01:04:30.410

Let's go back to the original line: "How can a language whose compiler is written in C ever be faster than C?" I think this really meant to say: how can a program written in Julia, whose core is written in C, ever be faster than a program written in C? Specifically, how could the "mandel" program as written in Julia run in 87% of the execution time of the equivalent "mandel" program written in C?

Babou's treatise is the only correct answer to this question so far. All the other responses so far are more or less answering other questions. The problem with babou's text is that the many-paragraphs-long theoretical description of "What is a compiler" is written in terms that the original poster will probably have trouble understanding. Anyone who grasps the concepts referred to by the words "semantic", "denotationally", "realization", "computable" and so on will already know the answer to the question.

The simpler answer is that neither C code, nor Julia code, is directly executable by the machine. Both have to be translated, and that translation process introduces a lot of ways that the executable machine code can be slower or faster, but still produce the same end result. Both C and Julia do compilation, which means a series of translations to another form. Commonly, a human-readable text file is translated to some internal representation, and then written out as a sequence of instructions that the computer can understand directly. With some languages, there's more to it than that, and Julia is one of these -- it has a "JIT" compiler, which means the whole translation process doesn't have to happen all at once for the entire program. But the end result for any language is machine code that needs no further translation, code that can be sent directly to the CPU to make it do something. In the end, THIS is the "computation", and there is more than one way to tell a CPU how to get the answer you want.

One could imagine a programming language that has both a "plus" and a "multiply" operator, and another language that has only "plus". If your computation requires multiplication, one language will be "slower" because of course the CPU can do both directly, but if you don't have any way to express the need to multiply 5 * 5, you are left having to write "5 + 5 + 5 + 5 + 5". The latter will take more time to arrive at the same answer. Presumably, there's some of this going on with Julia; perhaps the language allows the programmer to state the desired goal of computing a Mandelbrot set in a way that isn't possible to directly express in C.

The processor used for the benchmark was listed as a Xeon E7-8850 2.00GHz CPU. The C benchmark used the gcc 4.8.2 compiler to produce instructions for that CPU, while Julia uses the LLVM compiler framework. It's possible that gcc's backend (the part that produces machine code for a particular CPU architecture) isn't as advanced in some way as the LLVM backend. That could make a difference in performance. There are also many other things going on -- the compiler can "optimize" by perhaps issuing instructions in a different order than specified by the programmer, or even not doing some things at all if it can analyze the code and determine they're not required to get the right answer. And the programmer might have written part of the C program in a way that make it slow, but didn't make such mistakes in the Julia code -- for example, the order in which elements in a two-dimensional array are accessed might have no bearing on the end result of the computation, but the order might well make a difference in speed (see "Row major versus Column major layout of matrices").

All of these are ways of saying: there are lots of ways to write machine code to compute a Mandelbrot set, and the language you use has a major effect on how that machine code gets written. The more you understand about compilation, instruction sets, caches, and so on, the better equipped you will be to get the results you want. The major takeaway from the benchmark results cited for Julia is that no one language or tool is best at everything. In fact the best speed factor in the entire chart was for Java!

score 2 · Answer 8 · answered Aug 27 '15 at 00:58

The speed of a compiled program depends on two things:

The performance characteristics of the machine executing it
The contents of the executable file

The language a compiler is written in is irrelevant to (1). For example, a Java compiler can be written in C or Java or Python, but in all cases the "machine" executing the program is the JVM.

The language a compiler is written in is irrelevant to (2). For example, there is no reason why a C compiler written in Python cannot output exactly the same executable file as a C compiler written in C or Java.

score 1 · Answer 9 · answered Aug 24 '15 at 15:56

I'll try to offer a shorter answer.

The core of the question lies in the definition of "speed" of a language.

Most if not all speed comparison tests don't test what the maximum possible speed is. Instead they write a small program in a language they want to test, to solve a problem. When writing the program the programmer uses what the they assume* to be best practice and conventions of the language, at the time of the test. Then they measure the speed at which the program was executed.

*The assumptions are occasionally wrong.

score 0 · Answer 10 · answered Sep 11 '15 at 15:52

Code written in a language X whose Compiler is written in C, can outperform a code written in C, provided, C compiler does poor optimization compared to that of language X. If we keep optimization out of discussion then if compiler of X could generate better object code than that generated by C compiler, then also code written in X may win the race.

But if language X is an interpreted language, and the interpreter is written in C, and if we assume that interpreter of language X and code written in C is compiled by same C compiler, then in no way code written in X going to outperform code written in C, provided both implementation follow same algorithm and uses equivalent data structures.

How can a language whose compiler is written in C ever be faster than C?

10 Answers10

What is a compiler?

About bootstrapping

Linked

Related