It is said that there are uncountably many languages but only countably many Turing Machines. Could someone make this clear to me? And this doesn't mean that the set of TM is finite, yes?
3 Answers
A quick informal answer:
a Turing Machine (states, transitions, ecc.) can be encoded using a string of $0$'s and $1$'s;
so you can pick all the binary strings in lexicographic order (0,1,00,01,10,11,000,001,...) and enumerate the Turing machines (i.e. build a one to one corrispondence between natural numbers and Turing machines) repeating the following steps:
1) start with $i=1$, $m=1$
2) generate the next binary string $s_i$
in lexicographic order
3.1) if $s_i$ is a valid encoding of a Turing machine then output $s_i$ as the $m$-th Turing machine and set $m = m+1$;
3.2) if $s_i$ is not a valid encoding of a Turing machine then ignore it
4) set $i = i+1$ and goto step 2
In this way each natural number ($m= 1,2,3,...$) corresponds to a Turing machine, and each Turing machine has a corresponding $m$, because you scan all possible binary strings. So the set of Turing machines is countable.
On the other side consider all possible strings $S$ over alphabet $\{0,1\}$:
$S =\{0,1\}^* = \{\epsilon,0,1,00,01,10,11,000,001,...\}$
A language $L$ is a subset (possible infinite) of $S$: $L \subseteq S$.
So the set of all languages is exactly the power set of $S$:
$2^S = \{ \{\epsilon\}, \{0\}, \{1\}, \{0,1\}, ... \}$
But the power set of a countably infinite set is uncountable (it can be easily proved using the diagonalization method).
You can also apply the diagonalization method directly to the set of languages: suppose that the languages are countable, then we can arrange all of them in a table in which every (infinte) row $i$ represents the elements of the language $L_i$ and the columns represent the string over the alphabet $\{0,1\}$ ($(i,j)=1$ if and only if string $j$ is in $L_i$):
0 1 00 01 10 11 ...
L1 0 0 1 0 1 0
L2 1 1 0 1 0 1
L3 0 1 0 0 1 0
...
Then define a new language changing the "membership flag" of the elements $(i,i)$:
0 1 00 01 10 11 ...
L1 [1] 0 1 0 1 0
L2 1 [0] 0 1 0 1
L3 0 1 [1] 0 1 0
...
The new language $L_{new} = \{0,00,...\}$ built from the modified diagonal is different from every language $L_i$: $0 \notin L_1, 0 \in L_{new}$, $1 \in L_2, 1 \notin L_{new}$, $00 \notin L_3, 00 \in L_{new}$, .... But this is a contraddiction because by hypothesis the above table should be an enumeration of all languages.
A Turing machine always has a finite description. So there are finite number of states, transitions and tape symbols for each Turing machine. We can map these to a canonical representation, e.g. a string, which will be finite in length.
The set of all finite length strings is still countable, and the set of valid Turing machine string literals is a subset of all finite length strings.
- 280,205
- 27
- 317
- 514
- 141
- 3
Countable means finite or can be counted (can be put in one-to-one correspondence with the set of natural numbers $\mathbb N$). Uncountable means not countable (not finite and can't be put in one-to-one correspondence with $\mathbb N$).
A language is a set of strings (let's restrict ourselves to the alphabet $\Sigma=\{0,1\}$). The set $\Sigma^*$ is countable so the set of all languages is $2^{\Sigma^*}$ and it's uncountable. The previous result is a know theorem (if $S$ is countably infinite, then $2^S$ is uncountable)$^1$. This proves that the set of languages is uncountable.
I'll let somebody else answer the other part.
$^1$ I can write a proof if you want.
- 3,748
- 23
- 35