67

I was trying to explain to someone that C is Turing-complete, and realized that I don't actually know if it is, indeed, technically Turing-complete. (C as in the abstract semantics, not as in an actual implementation.)

The "obvious" answer (roughly: it can address an arbitrary amount of memory, so it can emulate a RAM machine, so it's Turing-complete) isn't actually correct, as far as I can tell, as although the C standard allows for size_t to be arbitrarily large, it must be fixed at some length, and no matter what length it is fixed at it is still finite. (In other words, although you could, given an arbitrary halting Turing machine, pick a length of size_t such that it will run "properly", there is no way to pick a length of size_t such that all halting Turing machines will run properly)

So: is C99 Turing-complete?

Raphael
  • 73,212
  • 30
  • 182
  • 400
TLW
  • 1,500
  • 1
  • 10
  • 16

14 Answers14

43

I'm not sure but I think the answer is no, for rather subtle reasons. I asked on Theoretical Computer Science a few years ago and didn't get an answer that goes beyond what I'll present here.

In most programming languages, you can simulate a Turing machine by:

  • simulating the finite automaton with a program that uses a finite amount of memory;
  • simulating the tape with a pair of linked lists of integers, representing the content of the tape before and after the current position. Moving the pointer means transferring the head of one of the lists onto the other list.

A concrete implementation running on a computer would run out of memory if the tape got too long, but an ideal implementation could execute the Turing machine program faithfully. This can be done with pen and paper, or by buying a computer with more memory, and a compiler targeting an architecture with more bits per word and so on if the program ever runs out of memory.

This doesn't work in C because it's impossible to have a linked list that can grow forever: there's always some limit on the number of nodes.

To explain why, I first need to explain what a C implementation is. C is actually a family of programming languages. The ISO C standard (more precisely, a specific version of this standard) defines (with the level of formality that English allows) the syntax and semantics a family of programming languages. C has a lot of undefined behavior and implementation-defined behavior. An “implementation” of C codifies all the implementation-defined behavior (the list of things to codify is in appendix J for C99). Each implementation of C is a separate programming language. Note that the meaning of the word “implementation” is a bit peculiar: what it really means is a language variant, there can be multiple different compiler programs that implement the same language variant.

In a given implementation of C, a byte has $2^{\texttt{CHAR_BIT}}$ possible values. All data can represented as an array of bytes: a type t has at most $2^{\texttt{CHAR_BIT} \times \texttt{sizeof(t)}}$ possible values. This number varies in different implementations of C, but for a given implementation of C, it's a constant.

In particular, pointers can only take at most $2^{\texttt{CHAR_BIT} \times \texttt{sizeof(void*)}}$ values. This means that there is a finite maximum number of addressable objects.

The values of CHAR_BIT and sizeof(void*) are observable, so if you run out of memory, you can't just resume running your program with larger values for those parameters. You would be running the program under a different programming language — a different C implementation.

If programs in a language can only have a bounded number of states, then the programming language is no more expressive than finite automata. The fragment of C that's restricted to addressable storage only allows at most $n \times 2^{\texttt{CHAR_BIT} \times \texttt{sizeof(void*)}}$ program states where $n$ is the size of the abstract syntax tree of the program (representing the state of the control flow), therefore this program can be simulated by a finite automaton with that many states. If C is more expressive, it has to be through the use of other features.

C does not directly impose a maximum recursion depth. An implementation is allowed to have a maximum, but it's also allowed not to have one. But how do we communicate between a function call and its parent? Arguments are no good if they're addressable, because that would indirectly limit the depth of recursion: if you have a function int f(int x) { … f(…) …} then all the occurrences of x on active frames of f have their own address and so the number of nested calls is bounded by the number of possible addresses for x.

A C program can use non-addressable storage in the form of register variables. “Normal” implementations can only have a small, finite number of variables that don't have an address, but in theory an implementation could allow an unbounded amount of register storage. In such an implementation, you can make an unbounded amount of recursive calls to a function, as long as its argument are register. But since the arguments are register, you can't make a pointer to them, and so you need to copy their data around explicitly: you can only pass around a finite amount of data, not an arbitrary-sized data structure that's made of pointers.

With unbounded recursion depth, and the restriction that a function can only get data from its direct caller (register arguments) and return data to its direct caller (the function return value), you get the power of deterministic pushdown automata.

I can't find a way to go further.

(Of course you could make the program store the tape content externally, through file input/output functions. But then you wouldn't be asking whether C is Turing-complete, but whether C plus an infinite storage system is Turing-complete, to which the answer is a boring “yes”. You might as well define the storage to be a Turing oracle — call fopen("oracle", "r+"), fwrite the initial tape content to it and fread back the final tape content.)

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
14

C99's addition of va_copy to the variadic argument API may give us a back door to Turing-completeness. Since it becomes possible to iterate through a variadic arguments list more than once in a function other than the one that originally received the arguments, va_args can be used to implement a pointerless pointer.

Of course, a real implementation of the variadic argument API is probably going to have a pointer somewhere, but in our abstract machine it can be implemented using magic instead.

Here's a demo implementing a 2-stack pushdown automaton with arbitrary transition rules:

#include <stdarg.h>
typedef struct { va_list va; } wrapped_stack; // Struct wrapper needed if va_list is an array type.
#define NUM_SYMBOLS /* ... */
#define NUM_STATES /* ... */
typedef enum { NOP, POP1, POP2, PUSH1, PUSH2 } operation_type;
typedef struct { int next_state; operation_type optype; int opsymbol; } transition;
transition transition_table[NUM_STATES][NUM_SYMBOLS][NUM_SYMBOLS] = { /* ... */ };

void step(int state, va_list stack1, va_list stack2);
void push1(va_list stack2, int next_state, ...) {
    va_list stack1;
    va_start(stack1, next_state);
    step(next_state, stack1, stack2);
}
void push2(va_list stack1, int next_state, ...) {
    va_list stack2;
    va_start(stack2, next_state);
    step(next_state, stack1, stack2);
}
void step(int state, va_list stack1, va_list stack2) {
    va_list stack1_copy, stack2_copy;
    va_copy(stack1_copy, stack1); va_copy(stack2_copy, stack2);
    int symbol1 = va_arg(stack1_copy, int), symbol2 = va_arg(stack2_copy, int);
    transition tr = transition_table[state][symbol1][symbol2];
    wrapped_stack ws;
    switch(tr.optype) {
        case NOP: step(tr.next_state, stack1, stack2);
        // Note: attempting to pop the stack's bottom value results in undefined behavior.
        case POP1: ws = va_arg(stack1_copy, wrapped_stack); step(tr.next_state, ws.va, stack2);
        case POP2: ws = va_arg(stack2_copy, wrapped_stack); step(tr.next_state, stack1, ws.va);
        case PUSH1: va_copy(ws.va, stack1); push1(stack2, tr.next_state, tr.opsymbol, ws);
        case PUSH2: va_copy(ws.va, stack2); push2(stack1, tr.next_state, tr.opsymbol, ws);
    }
}
void start_helper1(va_list stack1, int dummy, ...) {
    va_list stack2;
    va_start(stack2, dummy);
    step(0, stack1, stack2);
}
void start_helper0(int dummy, ...) {
    va_list stack1;
    va_start(stack1, dummy);
    start_helper1(stack1, 0, 0);
}
// Begin execution in state 0 with each stack initialized to {0}
void start() {
    start_helper0(0, 0);
}

Note: If va_list is an array type, then there are actually hidden pointer parameters to the functions. So it would probably be better to change the types of all va_list arguments to wrapped_stack.

feersum
  • 313
  • 2
  • 6
3

Nonstandard arithmetic, maybe?

So, it seems that the issue is the finite size of sizeof(t). However, I think I know a work around.

As far as I know, C does not require an implementation to use the standard integers for its integer type. Therefore, we could use a non-standard model of arithmetic. Then, we would set sizeof(t) to some nonstandard number, and now we will never reach it in a finite number of steps. Therefore, the length of the Turing machines tape will always be less than the "maximum", since the maximum is literally impossible to reach. sizeof(t) simply is not a number in the regular sense of the word.

This is one technicality of course: Tennenbaum's theorem. It states that the only computable model of Peano arithmetic is the standard one, which obviously would not do. However, as far as I know, C does not require implementations to use data types that satisfy the Peano axioms, nor does it require the implementation to be computable, so this should not be an issue.

What should happen if you try to output a nonstandard integer? Well, you can represent any nonstandard integer using a nonstandard string, so just stream digits from the front of that string.

user3840170
  • 121
  • 6
Christopher King
  • 788
  • 1
  • 5
  • 19
1

The answer is yes, but for an unexpected reason.

I believe the above comments are correct — given size_t is bounded, you cannot represent a turing machine with unbounded states by using C as intended. However, we can use C in a way which completely circumvents the size_t issue — using only the C preprocessor.

I will not go over the whole proof here — this answer explains it best. Essentially, using deferred expressions it is possible to create recursively expanding macros that expand forever. In this way, the depth of the recursion becomes limited only to the number of scans which the machine executes — this is a physical limitation, but in theory the machine could scan forever. The answer also explains how logical operations can be constructed.

So in conclusion: yes, C is turing-complete, but you have to totally misuse the preprocessor.

John
  • 35
  • 1
0

IMO, a strong limitation is that the addressable space (via the pointer size) is finite, and this is unrecoverable.

One could advocate that memory can be "swapped to disk", but at some point the address information will itself exceed the addressable size.

-1

You could define a language quite similar to C, where it is allowed to change sizeof, for example by assigning

sizeof(int) = 20;
sizeof(void*) += 4;

etc. That would be Turing complete. As your requirements go up, you would just modify sizeof(void*) to be sufficiently large, same with sizeof(size_t), and then you can use realloc to make your arrays bigger.

gnasher729
  • 32,238
  • 36
  • 56
-1

The problem isn’t that computers, being limited by the real world, cannot implement a Turing-complete machine. The problem in this question here is that the C language itself doesn’t allow it.

The C language could easily be changed to be Turing complete. Adding an integer type of unlimited size that can be used in malloc() and in pointer arithmetic would have solved the problem. In practice it wouldn’t make any difference, since we can’t even build a computer where 64 bit pointer sizes are the limit.

gnasher729
  • 32,238
  • 36
  • 56
-1

Removable media allows us to circumvent the unbounded memory problem. Perhaps people will think this is an abuse, but I think it's OK and essentially unavoidable anyway.

Fix any implementation of a universal Turing machine. For the tape, we use removable media. When the head runs off the end or beginning of the current disc, the machine prompts the user to insert the next or previous one. We can either use a special marker to denote the left end of the simulated tape, or have a tape that's unbounded in both directions.

The key point here is that everything the C program must do is finite. The computer only needs enough memory to simulate the automaton, and size_t only needs to be big enough to allow addressing that (actually rather small) amount of memory and on the discs, which can be of any fixed finite size. Since the user is only prompted to insert the next or previous disc, we don't need unboundedly large integers to say "Please insert disc number 123456..."

I suppose the principal objection is likely to be to the involvement of the user but that seems to be unavoidable in any implementation, because there seems to be no other way of implementing unbounded memory.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
-3

The problem here is that you're artificially limiting the set of C programs to just the ones that use pointers (and size_t, intptr_t, what have you), but that's not necessary. If you're forced to use only the standard library, then "no" is kinda correct like in the accepted answer, but the semantics of C are indeed Turing complete.

Say you have a Real Turing Machine (TM) and it provides a library for pushing and popping from two stacks of unbounded size, and (for thoroughness) lets also say that they just compile to push and pop assembler instructions. Now your C program can push and pop infinitely to two stacks, unrestrained by the semantics of C. Et voila, Turing completeness.

This is not the only way we can show that C is Turing complete, and it turns out that this is not a high bar to meet for programming languages. The real challenge is in constructing a language that is not turing complete (e.g. SQL), and giving a proof for it.

Note: Other answers seem to suggest that Turing completeness requires arbitrary memory access, but that's simply not true. A push-down automaton with two unbounded stacks is sufficient, hence the scenario above.

Brent
  • 2,583
  • 3
  • 16
  • 23
-3

Mapping "C" to a Turing equivalent abstract model of computation

C as it currently exists is not Turing complete because C inherently requires some fixed pointer size. We can however map a C like language to an abstract model of computation that is Turing complete.

The basic syntax and semantics of C could be mapped to an abstract model of computation that is Turing equivalent. The RASP model of computation is a model that "C" can be mapped to.

A variation of the RASP model is shown below that the x86/x64 language maps to. Since it is already known that "C" maps to the x86/x64 concrete models of computation we know that "C" maps to the following abstract model:

Instruction
     : INTEGER ":" OPCODE                     // Address:Opcode 
     | INTEGER ":" OPCODE INTEGER             // Address:Opcode Operand 
     | INTEGER ":" OPCODE INTEGER "," INTEGER // Address:Opcode Operand, Operand
HEXDIGIT [a-fA-F0-9]
INTEGER  {HEXDIGIT}+ 
OPCODE   HEXDIGIT{4} 
// OPCODE distinguishes between INTEGER literal and machine address operand.
// OPCODE INTEGER is space delimited.

Mapping the C language to the above abstract model of computation would enable the C language to become equivalent to a Turing machine.

Because the x86/x64 language maps to the above abstract model this also provides the basis for recognizing the subset of Turing equivalent x86/x64/C computations. As long as the required pointer size is no larger than the pointer size that is available the computation is Turing equivalent on finite hardware.

Turing equivalent computations derive equivalent output or fail to halt on equivalent input between the concrete machine and its Turing machine equivalent.

polcott
  • 99
  • 1
  • 1
  • 14
-3

Unlike what has been stated, the C standard is more powerful than a DFA. Ignoring other features for information access other than pointers, it is probably true that a DFA could perfectly model C for a fixed character size and size_t due to the finite amount of memory.

But for a given program, i.e a specific input for the C language, we can change size_t to fit the needs of the program. Without some fancy workaround we (for the sake of argument) still can't have an infinite amount of memory, but we can change the amount we have, i.e the size of the band, depending on the input. This means that the C family is pretty obviously a deterministic linear bound automata (a deterministic Turing machine with bounded memory), which is quite a bit stronger than a DFA or even a NDPA. Since I dont see any reason why our function of the input can't be quadratic, and NSPACE(O(n)) being a subset of DSPACE(O(n^2)) per Savitch's theorem, C is just as powerful as any nondeterministic LBA. This puts it just below a full Turing machine in terms of power.

I spent some time trying to figure out if you could use streams to bypass the size_t constraints, but all options either only go in one direction (like stdin) or have a moving pointer. I think for you to be able to access an infinite tape of sorts you would need some sort of operating system or another way to change the direction. It might be a matter of insufficient creativity on my part but I dont think so.

In conclusion: C probably is not turing complete, but it is very close to it, both in what would need to change, and in terms of computational power.

Cris
  • 1
  • 1
-3

In practice, these restrictions are irrelevant to Turing completeness. The real requirement is to allow the tape to be arbitrary long, not infinite. That would create a halting problem of a different kind (how does the universe "compute" the tape?)

It's as bogus as saying "Python isn't Turing complete because you can't make a list infinitely large".

[Edit: thanks to Mr. Whitledge for clarifying how to edit.]

-3

Choose size_t to be infinitely large

You could choose size_t to be infinitely large. Naturally, it is impossible to realize such an implementation. But that's no surprise, given the finite nature of the world we live in.

Practical Implications

But even if it were possible to realize such an implementation, there would be practical issues. Consider the following C statement:

printf("%zu\n",SIZE_MAX);

This prints a decimal representation of SIZE_MAX to standard out. Presumably, SIZE_MAX is $O(2^{size\_t})$. So if size_t is infinitely large, SIZE_MAX is also infinitely large. The only way I know to print a decimal form of an infinitely large number is to produce an infinite stream of decimal digits. This means that printf will not terminate in some cases.

Fortunately, for our theoretical purposes, I could not find any requirement in the specification that guarantees printf will terminate for all inputs. So, as far as I am aware, we do not violate the C specification here.

On Computational Completeness

It still remains to prove that our theoretical implementation is Turing Complete. We can show this by implementing "any single-taped Turing Machine".

Most of us have probably implemented a Turing Machine as a school project. I won't give the details of a specific implementation, but here's a commonly used strategy:

  • The number of states, number of symbols, and state transition table are fixed for any given machine. So we can represent states and symbols as numbers, and the state transition table as a 2-dimensional array.
  • The tape can be represented as a linked list. We can either use a single double-linked list, or two single-linked lists (one for each direction from the current position).

Now let's see what's required to realize such an implementation:

  • The ability to represent some fixed, but arbitrarily large, set of numbers. In order to represent any arbitrary number, we choose MAX_INT to be infinite as well. (Alternatively, we could use other objects to represent states and symbols.)
  • The ability to construct an arbitrarily large linked list for our tape. Once again, there is no finite limit on the size. This means we cannot construct this list up-front, as we would spend forever just to construct our tape. But, we can construct this list incrementally if we use dynamic memory allocation. We can use malloc, but there's a little more we must consider:
    • The C specification allows malloc to fail if, e.g., available memory has been exhausted. So our implementation is only truly universal if malloc never fails.
    • However, if our implementation is run on a machine with infinite memory, then there is no need for malloc to fail. Without violating the C standard, our implementation will guarantee that malloc will never fail.
  • The ability to dereference pointers, lookup array elements, and access the members of a linked list node.

So the above list is what is necessary to implement a Turing Machine in our hypothetical C implementation. These features must terminate. Anything else, however, may be allowed to not terminate (unless required by the standard). This includes arithmetic, IO, etc.

Nathan Davis
  • 642
  • 1
  • 5
  • 13
-4

The main argument here was that the size of the size_t is finite, although can be infinitely large.

There is a workaround for it, though I am not sure if this coincides with ISO C.

Assume you have a machine with infinite memory. Thus you are not bounded for pointer size. You still have your size_t type. If you ask me what is sizeof(size_t) the answer will be just sizeof(size_t). If you ask if it is greater than 100 for example the answer is yes. If you ask what is sizeof(size_t) / 2 as you could guess the answer is still sizeof(size_t). If you want to print it we can agree on some output. The difference of these two can be NaN and so on.

The summary is that relaxing the condition for size_t have finite size won't break any programs already existing.

P.S. Allocating memory sizeof(size_t) is still possible, you need only countable size, so let's say you take all evens (or similar trick).

Eugene
  • 1,106
  • 1
  • 6
  • 14