Week 9: Universal TMs and undecidability¶

In [1]:
from tock import *
# Tock's native notation for TMs is different; use Sipser's
# If you get an error here, please upgrade to the latest Tock
settings.display_direction_as = 'alpha'

In [2]:
%%html
<style>.nbviewer div.output_area { overflow-y: auto; max-height: 480px; }</style>


Tuesday¶

Universal Turing machines¶

Read page 202, just from "Before we get to the proof..." to "...stored-program computers."

Most of you are familiar with the idea of a programming language interpreter, which takes a program and some input to the program, and produces the same output that the program itself would have:

The universal TM (UTM) $U$ defined on page 202 is TM that is a TM interpreter. It can simulate any other TM. We've seen machines that simulate other machines (for example, the intersection of two DFAs $M_1$ and $M_2$ simulates both $M_1$ and $M_2$), but in those cases, the simulated machine was always hard-coded into the simulator. A universal TM is different because the code of the simulated machine is part of the input. That is, it takes as input both the code of a TM $M$ and an input string $w$, and simulates what $M$ would do on $w$.

This concept is important because it is used in some of the undecidability proofs we'll see soon, and also because it's related to the concept of a stored-program computer.

The so-called implementation given in the book does not provide much insight into how one might actually write a UTM. We've seen one instance already: last class, we ran a C-to-TM compiler on a TM simulator written in C. The resulting (huge) Turing machine is a UTM.

But let's think about a more conventional implementation of a UTM. First, we have to show how to encode a TM as a string. Here's Turing's original encoding:

• If the states in $Q$ are numbered $q_1, q_2, \ldots$, then state $q_i$ is encoded as $\mathtt{DA}^i$. The start state is $q_1$. Turing's definition didn't have accept or reject states, but we just have to fix some convention, like $q_2$ is the accept state and $q_3$ is the reject state.
• If the symbols in $\Gamma$ are numbered $a_0, a_1, \ldots$, where $a_0$ is the blank symbol, then symbol $a_i$ is encoded as $\mathtt{DC}^i$.
• Then, the transition is encoded as $\mathtt{DA}^i \mathtt{DC}^j \mathtt{DC}^l \mathtt{L} \mathtt{DA}^k$, and similarly if the move is R or N (for "no move," equivalent to the book's S).
• The machine is encoded as $əT_1;T_2;\cdots;T_n⸬$ where the $T_i$ are the transitions of the machine.

Second, we have to show the universal TM itself. It is often constructed as a TM with three tapes:

1. An encoding of $M$, the machine being simulated.
2. The tape of $M$.
3. The state of $M$.

An implementation description would be: On input $\langle M, w\rangle$, where $M$ is a TM and $w$ is a string:

1. Split the input into $M$ on tape 1 and $w$ on tape 2.
2. Initialize tape 3 to the start state of $M$.
3. Repeat:
1. If the state (tape 3) is the accept state, accept; if it is the reject state, reject.
2. Search on tape 1 for an instruction that matches the current state (encoded on tape 3) and current input symbol (encoded on tape 2).
3. Write the new state to tape 3 and the new symbol to tape 2.
4. Move the head on tape 2 to the left or right as indicated by the instruction.

The smallest UTM¶

There's a cottage industry of seeing who can make the smallest universal TM. The current record holder, due to Rogozhin, is this (I modified it slightly for Sipser's definition of TM):

In [3]:
u = read_csv('utm.csv')
u


The reason it can be so small is that the way it "encodes" a TM is actually to convert it into a simpler (but still Turing-equivalent) formalism called a 2-tag system. Here's an example TM, Sipser's $M_2$:

In [4]:
m2 = read_csv('tm-m2.csv')
m2


This machine converts to a 2-tag system with 450 rules. The rules look like CFG rules, but they work differently. (Match the first symbol with a rule's left-hand side, remove the first two symbols, and append the right-hand side.)

In [5]:
%cat tm-m2.tag

('A',_0,_1) -> ('C',_0,_1) x ('c',_0,_1) x
('α',_0,_1) -> ('c',_0,_1) x ('c',_0,_1) x
('B',_0,_1) -> ('S',_0,_1)
('β',_0,_1) -> ('s',_0,_1)
('D',_0,_1) -> ('A',_0,_1) x
('d',_0,_1) -> ('α',_0,_1) x
('T',_0,_1) -> ('B',_0,_1) x
('t',_0,_1) -> ('β',_0,_1) x
('A',_(0,_'write0',_'move+2'),_1) -> ('C',_(0,_'write0',_'move+2'),_1) x
('α',_(0,_'write0',_'move+2'),_1) -> ('c',_(0,_'write0',_'move+2'),_1) x ('c',_(0,_'write0',_'move+2'),_1) x
('B',_(0,_'write0',_'move+2'),_1) -> ('S',_(0,_'write0',_'move+2'),_1)
('β',_(0,_'write0',_'move+2'),_1) -> ('s',_(0,_'write0',_'move+2'),_1)
('C',_(0,_'write0',_'move+2'),_1) -> ('D',_(0,_'move+1'),_1) ('D',_(0,_'move+1'),_0)
('c',_(0,_'write0',_'move+2'),_1) -> ('d',_(0,_'move+1'),_1) ('d',_(0,_'move+1'),_0)
('S',_(0,_'write0',_'move+2'),_1) -> ('T',_(0,_'move+1'),_1) ('T',_(0,_'move+1'),_0)
('s',_(0,_'write0',_'move+2'),_1) -> ('t',_(0,_'move+1'),_1) ('t',_(0,_'move+1'),_0)
('D',_(0,_'write0',_'move+2'),_1) -> ('A',_(0,_'write0',_'move+2'),_1) x
('d',_(0,_'write0',_'move+2'),_1) -> ('α',_(0,_'write0',_'move+2'),_1) x
('T',_(0,_'write0',_'move+2'),_1) -> ('B',_(0,_'write0',_'move+2'),_1) x
('t',_(0,_'write0',_'move+2'),_1) -> ('β',_(0,_'write0',_'move+2'),_1) x
('A',_(0,_'move+1'),_0) -> ('C',_(0,_'move+1'),_0) x
('α',_(0,_'move+1'),_0) -> ('c',_(0,_'move+1'),_0) x ('c',_(0,_'move+1'),_0) x
('B',_(0,_'move+1'),_0) -> ('S',_(0,_'move+1'),_0)
('β',_(0,_'move+1'),_0) -> ('s',_(0,_'move+1'),_0)
('C',_(0,_'move+1'),_0) -> ('D',_1,_1) ('D',_1,_0)
('c',_(0,_'move+1'),_0) -> ('d',_1,_1) ('d',_1,_0)
('S',_(0,_'move+1'),_0) -> ('T',_1,_1) ('T',_1,_0)
('s',_(0,_'move+1'),_0) -> ('t',_1,_1) ('t',_1,_0)
('D',_(0,_'move+1'),_0) -> x ('A',_(0,_'move+1'),_0) x
('d',_(0,_'move+1'),_0) -> ('α',_(0,_'move+1'),_0) x
('T',_(0,_'move+1'),_0) -> ('B',_(0,_'move+1'),_0) x
('t',_(0,_'move+1'),_0) -> ('β',_(0,_'move+1'),_0) x
('A',_1,_0) -> ('C',_1,_0) x
('α',_1,_0) -> ('c',_1,_0) x ('c',_1,_0) x
('B',_1,_0) -> ('S',_1,_0)
('β',_1,_0) -> ('s',_1,_0)
('D',_1,_0) -> x ('A',_1,_0) x
('d',_1,_0) -> ('α',_1,_0) x
('T',_1,_0) -> ('B',_1,_0) x
('t',_1,_0) -> ('β',_1,_0) x
('A',_1,_1) -> ('C',_1,_1) x ('c',_1,_1) x
('α',_1,_1) -> ('c',_1,_1) x ('c',_1,_1) x
('B',_1,_1) -> ('S',_1,_1)
('β',_1,_1) -> ('s',_1,_1)
('D',_1,_1) -> ('A',_1,_1) x
('d',_1,_1) -> ('α',_1,_1) x
('T',_1,_1) -> ('B',_1,_1) x
('t',_1,_1) -> ('β',_1,_1) x
('A',_(1,_'write0',_'move+2'),_1) -> ('C',_(1,_'write0',_'move+2'),_1) x
('α',_(1,_'write0',_'move+2'),_1) -> ('c',_(1,_'write0',_'move+2'),_1) x ('c',_(1,_'write0',_'move+2'),_1) x
('B',_(1,_'write0',_'move+2'),_1) -> ('S',_(1,_'write0',_'move+2'),_1)
('β',_(1,_'write0',_'move+2'),_1) -> ('s',_(1,_'write0',_'move+2'),_1)
('C',_(1,_'write0',_'move+2'),_1) -> ('D',_(1,_'move+1'),_1) ('D',_(1,_'move+1'),_0)
('c',_(1,_'write0',_'move+2'),_1) -> ('d',_(1,_'move+1'),_1) ('d',_(1,_'move+1'),_0)
('S',_(1,_'write0',_'move+2'),_1) -> ('T',_(1,_'move+1'),_1) ('T',_(1,_'move+1'),_0)
('s',_(1,_'write0',_'move+2'),_1) -> ('t',_(1,_'move+1'),_1) ('t',_(1,_'move+1'),_0)
('D',_(1,_'write0',_'move+2'),_1) -> ('A',_(1,_'write0',_'move+2'),_1) x
('d',_(1,_'write0',_'move+2'),_1) -> ('α',_(1,_'write0',_'move+2'),_1) x
('T',_(1,_'write0',_'move+2'),_1) -> ('B',_(1,_'write0',_'move+2'),_1) x
('t',_(1,_'write0',_'move+2'),_1) -> ('β',_(1,_'write0',_'move+2'),_1) x
('A',_(1,_'move+1'),_1) -> ('C',_(1,_'move+1'),_1) x ('c',_(1,_'move+1'),_1) x
('α',_(1,_'move+1'),_1) -> ('c',_(1,_'move+1'),_1) x ('c',_(1,_'move+1'),_1) x
('B',_(1,_'move+1'),_1) -> ('S',_(1,_'move+1'),_1)
('β',_(1,_'move+1'),_1) -> ('s',_(1,_'move+1'),_1)
('C',_(1,_'move+1'),_1) -> ('D',_2,_1) ('D',_2,_0)
('c',_(1,_'move+1'),_1) -> ('d',_2,_1) ('d',_2,_0)
('S',_(1,_'move+1'),_1) -> ('T',_2,_1) ('T',_2,_0)
('s',_(1,_'move+1'),_1) -> ('t',_2,_1) ('t',_2,_0)
('D',_(1,_'move+1'),_1) -> ('A',_(1,_'move+1'),_1) x
('d',_(1,_'move+1'),_1) -> ('α',_(1,_'move+1'),_1) x
('T',_(1,_'move+1'),_1) -> ('B',_(1,_'move+1'),_1) x
('t',_(1,_'move+1'),_1) -> ('β',_(1,_'move+1'),_1) x
('A',_2,_0) -> ('C',_2,_0) x
('α',_2,_0) -> ('c',_2,_0) x ('c',_2,_0) x
('B',_2,_0) -> ('S',_2,_0)
('β',_2,_0) -> ('s',_2,_0)
('D',_2,_0) -> x ('A',_2,_0) x
('d',_2,_0) -> ('α',_2,_0) x
('T',_2,_0) -> ('B',_2,_0) x
('t',_2,_0) -> ('β',_2,_0) x
('D',_(2,_'write0',_'move-2'),_0) -> x ('A',_(2,_'write0',_'move-2'),_0) x
('d',_(2,_'write0',_'move-2'),_0) -> ('α',_(2,_'write0',_'move-2'),_0) x
('T',_(2,_'write0',_'move-2'),_0) -> ('B',_(2,_'write0',_'move-2'),_0) x
('t',_(2,_'write0',_'move-2'),_0) -> ('β',_(2,_'write0',_'move-2'),_0) x
('D',_(2,_'move-1'),_0) -> x ('A',_(2,_'move-1'),_0) x
('d',_(2,_'move-1'),_0) -> ('α',_(2,_'move-1'),_0) x
('T',_(2,_'move-1'),_0) -> ('B',_(2,_'move-1'),_0) x
('t',_(2,_'move-1'),_0) -> ('β',_(2,_'move-1'),_0) x
('D',_(2,_'move-1'),_1) -> ('A',_(2,_'move-1'),_1) x
('d',_(2,_'move-1'),_1) -> ('α',_(2,_'move-1'),_1) x
('T',_(2,_'move-1'),_1) -> ('B',_(2,_'move-1'),_1) x
('t',_(2,_'move-1'),_1) -> ('β',_(2,_'move-1'),_1) x
('A',_2,_1) -> ('C',_2,_1) x ('c',_2,_1) x
('α',_2,_1) -> ('c',_2,_1) x ('c',_2,_1) x
('B',_2,_1) -> ('S',_2,_1)
('β',_2,_1) -> ('s',_2,_1)
('D',_2,_1) -> ('A',_2,_1) x
('d',_2,_1) -> ('α',_2,_1) x
('T',_2,_1) -> ('B',_2,_1) x
('t',_2,_1) -> ('β',_2,_1) x
('A',_3,_0) -> ('C',_3,_0) x
('α',_3,_0) -> ('c',_3,_0) x ('c',_3,_0) x
('B',_3,_0) -> ('S',_3,_0)
('β',_3,_0) -> ('s',_3,_0)
('D',_3,_0) -> x ('A',_3,_0) x
('d',_3,_0) -> ('α',_3,_0) x
('T',_3,_0) -> ('B',_3,_0) x
('t',_3,_0) -> ('β',_3,_0) x
('A',_3,_1) -> ('C',_3,_1) x ('c',_3,_1) x
('α',_3,_1) -> ('c',_3,_1) x ('c',_3,_1) x
('B',_3,_1) -> ('S',_3,_1)
('β',_3,_1) -> ('s',_3,_1)
('D',_3,_1) -> ('A',_3,_1) x
('d',_3,_1) -> ('α',_3,_1) x
('T',_3,_1) -> ('B',_3,_1) x
('t',_3,_1) -> ('β',_3,_1) x
('A',_(3,_'write0',_'move+2'),_1) -> ('C',_(3,_'write0',_'move+2'),_1) x
('α',_(3,_'write0',_'move+2'),_1) -> ('c',_(3,_'write0',_'move+2'),_1) x ('c',_(3,_'write0',_'move+2'),_1) x
('B',_(3,_'write0',_'move+2'),_1) -> ('S',_(3,_'write0',_'move+2'),_1)
('β',_(3,_'write0',_'move+2'),_1) -> ('s',_(3,_'write0',_'move+2'),_1)
('C',_(3,_'write0',_'move+2'),_1) -> ('D',_(3,_'move+1'),_1) ('D',_(3,_'move+1'),_0)
('c',_(3,_'write0',_'move+2'),_1) -> ('d',_(3,_'move+1'),_1) ('d',_(3,_'move+1'),_0)
('S',_(3,_'write0',_'move+2'),_1) -> ('T',_(3,_'move+1'),_1) ('T',_(3,_'move+1'),_0)
('s',_(3,_'write0',_'move+2'),_1) -> ('t',_(3,_'move+1'),_1) ('t',_(3,_'move+1'),_0)
('D',_(3,_'write0',_'move+2'),_1) -> ('A',_(3,_'write0',_'move+2'),_1) x
('d',_(3,_'write0',_'move+2'),_1) -> ('α',_(3,_'write0',_'move+2'),_1) x
('T',_(3,_'write0',_'move+2'),_1) -> ('B',_(3,_'write0',_'move+2'),_1) x
('t',_(3,_'write0',_'move+2'),_1) -> ('β',_(3,_'write0',_'move+2'),_1) x
('A',_(3,_'move+1'),_1) -> ('C',_(3,_'move+1'),_1) x ('c',_(3,_'move+1'),_1) x
('α',_(3,_'move+1'),_1) -> ('c',_(3,_'move+1'),_1) x ('c',_(3,_'move+1'),_1) x
('B',_(3,_'move+1'),_1) -> ('S',_(3,_'move+1'),_1)
('β',_(3,_'move+1'),_1) -> ('s',_(3,_'move+1'),_1)
('C',_(3,_'move+1'),_1) -> ('D',_2,_1) ('D',_2,_0)
('c',_(3,_'move+1'),_1) -> ('d',_2,_1) ('d',_2,_0)
('S',_(3,_'move+1'),_1) -> ('T',_2,_1) ('T',_2,_0)
('s',_(3,_'move+1'),_1) -> ('t',_2,_1) ('t',_2,_0)
('D',_(3,_'move+1'),_1) -> ('A',_(3,_'move+1'),_1) x
('d',_(3,_'move+1'),_1) -> ('α',_(3,_'move+1'),_1) x
('T',_(3,_'move+1'),_1) -> ('B',_(3,_'move+1'),_1) x
('t',_(3,_'move+1'),_1) -> ('β',_(3,_'move+1'),_1) x
('A',_4,_0) -> ('C',_4,_0) x
('α',_4,_0) -> ('c',_4,_0) x ('c',_4,_0) x
('B',_4,_0) -> ('S',_4,_0)
('β',_4,_0) -> ('s',_4,_0)
('D',_4,_0) -> x ('A',_4,_0) x
('d',_4,_0) -> ('α',_4,_0) x
('T',_4,_0) -> ('B',_4,_0) x
('t',_4,_0) -> ('β',_4,_0) x
('D',_(4,_'write0',_'move-2'),_0) -> x ('A',_(4,_'write0',_'move-2'),_0) x
('d',_(4,_'write0',_'move-2'),_0) -> ('α',_(4,_'write0',_'move-2'),_0) x
('T',_(4,_'write0',_'move-2'),_0) -> ('B',_(4,_'write0',_'move-2'),_0) x
('t',_(4,_'write0',_'move-2'),_0) -> ('β',_(4,_'write0',_'move-2'),_0) x
('D',_(4,_'move-1'),_0) -> x ('A',_(4,_'move-1'),_0) x
('d',_(4,_'move-1'),_0) -> ('α',_(4,_'move-1'),_0) x
('T',_(4,_'move-1'),_0) -> ('B',_(4,_'move-1'),_0) x
('t',_(4,_'move-1'),_0) -> ('β',_(4,_'move-1'),_0) x
('D',_(4,_'move-1'),_1) -> ('A',_(4,_'move-1'),_1) x
('d',_(4,_'move-1'),_1) -> ('α',_(4,_'move-1'),_1) x
('T',_(4,_'move-1'),_1) -> ('B',_(4,_'move-1'),_1) x
('t',_(4,_'move-1'),_1) -> ('β',_(4,_'move-1'),_1) x
('A',_4,_1) -> ('C',_4,_1) x ('c',_4,_1) x
('α',_4,_1) -> ('c',_4,_1) x ('c',_4,_1) x
('B',_4,_1) -> ('S',_4,_1)
('β',_4,_1) -> ('s',_4,_1)
('D',_4,_1) -> ('A',_4,_1) x
('d',_4,_1) -> ('α',_4,_1) x
('T',_4,_1) -> ('B',_4,_1) x
('t',_4,_1) -> ('β',_4,_1) x
('D',_(4,_'write1',_'move-2'),_1) -> ('A',_(4,_'write1',_'move-2'),_1) x
('d',_(4,_'write1',_'move-2'),_1) -> ('α',_(4,_'write1',_'move-2'),_1) x
('T',_(4,_'write1',_'move-2'),_1) -> ('B',_(4,_'write1',_'move-2'),_1) x
('t',_(4,_'write1',_'move-2'),_1) -> ('β',_(4,_'write1',_'move-2'),_1) x
("S'",_0,_1) -> ('B',_0,_1) x
("s'",_0,_1) -> ('β',_0,_1) x
('β',_(0,_'read1'),_0) -> ("s'",_(0,_'write0',_'move+2'),_1) ("s'",_(0,_'write0',_'move+2'),_0) ("s'",_(0,_'write0',_'move+2'),_1) ("s'",_(0,_'write0',_'move+2'),_0)
("S'",_(0,_'write0',_'move+2'),_1) -> ('B',_(0,_'write0',_'move+2'),_1) x
("s'",_(0,_'write0',_'move+2'),_1) -> ('β',_(0,_'write0',_'move+2'),_1) x
("S'",_(0,_'move+1'),_0) -> x ('B',_(0,_'move+1'),_0) x
("s'",_(0,_'move+1'),_0) -> ('β',_(0,_'move+1'),_0) x
("S'",_1,_0) -> x ('B',_1,_0) x
("s'",_1,_0) -> ('β',_1,_0) x
("S'",_1,_1) -> ('B',_1,_1) x
("s'",_1,_1) -> ('β',_1,_1) x
('B',_(1,_'read1'),_0) -> ("S'",_(1,_'write0',_'move+2'),_1) ("S'",_(1,_'write0',_'move+2'),_0) ("s'",_(1,_'write0',_'move+2'),_1) ("s'",_(1,_'write0',_'move+2'),_0)
('β',_(1,_'read1'),_0) -> ("s'",_(1,_'write0',_'move+2'),_1) ("s'",_(1,_'write0',_'move+2'),_0) ("s'",_(1,_'write0',_'move+2'),_1) ("s'",_(1,_'write0',_'move+2'),_0)
("S'",_(1,_'write0',_'move+2'),_1) -> ('B',_(1,_'write0',_'move+2'),_1) x
("s'",_(1,_'write0',_'move+2'),_1) -> ('β',_(1,_'write0',_'move+2'),_1) x
("S'",_(1,_'move+1'),_1) -> ('B',_(1,_'move+1'),_1) x
("s'",_(1,_'move+1'),_1) -> ('β',_(1,_'move+1'),_1) x
("S'",_2,_0) -> x ('B',_2,_0) x
("s'",_2,_0) -> ('β',_2,_0) x
('β',_(2,_'read0'),_0) -> ("s'",_(2,_'write0',_'move-2'),_1) ("s'",_(2,_'write0',_'move-2'),_0) ("s'",_(2,_'write0',_'move-2'),_1) ("s'",_(2,_'write0',_'move-2'),_0)
('A',_(2,_'write0',_'move-2'),_0) -> ('C',_(2,_'write0',_'move-2'),_0)
('α',_(2,_'write0',_'move-2'),_0) -> ('c',_(2,_'write0',_'move-2'),_0)
('B',_(2,_'write0',_'move-2'),_0) -> ("S'",_(2,_'move-1'),_1) ("S'",_(2,_'move-1'),_0)
('β',_(2,_'write0',_'move-2'),_0) -> ("s'",_(2,_'move-1'),_1) ("s'",_(2,_'move-1'),_0) ("s'",_(2,_'move-1'),_1) ("s'",_(2,_'move-1'),_0)
('C',_(2,_'write0',_'move-2'),_0) -> ('A',_(2,_'move-1'),_1) ('A',_(2,_'move-1'),_0)
('c',_(2,_'write0',_'move-2'),_0) -> ('α',_(2,_'move-1'),_1) ('α',_(2,_'move-1'),_0)
("S'",_(2,_'write0',_'move-2'),_0) -> x ('B',_(2,_'write0',_'move-2'),_0) x
("s'",_(2,_'write0',_'move-2'),_0) -> ('β',_(2,_'write0',_'move-2'),_0) x
('A',_(2,_'move-1'),_0) -> ('C',_(2,_'move-1'),_0)
('α',_(2,_'move-1'),_0) -> ('c',_(2,_'move-1'),_0)
('B',_(2,_'move-1'),_0) -> ("S'",_4,_1) ("S'",_4,_0)
('β',_(2,_'move-1'),_0) -> ("s'",_4,_1) ("s'",_4,_0) ("s'",_4,_1) ("s'",_4,_0)
('C',_(2,_'move-1'),_0) -> ('A',_4,_1) ('A',_4,_0)
('c',_(2,_'move-1'),_0) -> ('α',_4,_1) ('α',_4,_0)
("S'",_(2,_'move-1'),_0) -> x ('B',_(2,_'move-1'),_0) x
("s'",_(2,_'move-1'),_0) -> ('β',_(2,_'move-1'),_0) x
('A',_(2,_'move-1'),_1) -> ('C',_(2,_'move-1'),_1)
('α',_(2,_'move-1'),_1) -> ('c',_(2,_'move-1'),_1)
('B',_(2,_'move-1'),_1) -> ("S'",_4,_1) ("S'",_4,_0) ("s'",_4,_1) ("s'",_4,_0)
('β',_(2,_'move-1'),_1) -> ("s'",_4,_1) ("s'",_4,_0) ("s'",_4,_1) ("s'",_4,_0)
('C',_(2,_'move-1'),_1) -> ('A',_4,_1) ('A',_4,_0)
('c',_(2,_'move-1'),_1) -> ('α',_4,_1) ('α',_4,_0)
("S'",_(2,_'move-1'),_1) -> ('B',_(2,_'move-1'),_1) x
("s'",_(2,_'move-1'),_1) -> ('β',_(2,_'move-1'),_1) x
("S'",_2,_1) -> ('B',_2,_1) x
("s'",_2,_1) -> ('β',_2,_1) x
("S'",_3,_0) -> x ('B',_3,_0) x
("s'",_3,_0) -> ('β',_3,_0) x
("S'",_3,_1) -> ('B',_3,_1) x
("s'",_3,_1) -> ('β',_3,_1) x
('B',_(3,_'read1'),_0) -> ("S'",_(3,_'write0',_'move+2'),_1) ("S'",_(3,_'write0',_'move+2'),_0) ("s'",_(3,_'write0',_'move+2'),_1) ("s'",_(3,_'write0',_'move+2'),_0)
('β',_(3,_'read1'),_0) -> ("s'",_(3,_'write0',_'move+2'),_1) ("s'",_(3,_'write0',_'move+2'),_0) ("s'",_(3,_'write0',_'move+2'),_1) ("s'",_(3,_'write0',_'move+2'),_0)
("S'",_(3,_'write0',_'move+2'),_1) -> ('B',_(3,_'write0',_'move+2'),_1) x
("s'",_(3,_'write0',_'move+2'),_1) -> ('β',_(3,_'write0',_'move+2'),_1) x
("S'",_(3,_'move+1'),_1) -> ('B',_(3,_'move+1'),_1) x
("s'",_(3,_'move+1'),_1) -> ('β',_(3,_'move+1'),_1) x
("S'",_4,_0) -> x ('B',_4,_0) x
("s'",_4,_0) -> ('β',_4,_0) x
('B',_(4,_'read0'),_1) -> ("S'",_(4,_'write0',_'move-2'),_1) ("S'",_(4,_'write0',_'move-2'),_0) ("s'",_(4,_'write0',_'move-2'),_1) ("s'",_(4,_'write0',_'move-2'),_0)
('β',_(4,_'read0'),_1) -> ("s'",_(4,_'write0',_'move-2'),_1) ("s'",_(4,_'write0',_'move-2'),_0) ("s'",_(4,_'write0',_'move-2'),_1) ("s'",_(4,_'write0',_'move-2'),_0)
('A',_(4,_'write0',_'move-2'),_0) -> ('C',_(4,_'write0',_'move-2'),_0)
('α',_(4,_'write0',_'move-2'),_0) -> ('c',_(4,_'write0',_'move-2'),_0)
('B',_(4,_'write0',_'move-2'),_0) -> ("S'",_(4,_'move-1'),_1) ("S'",_(4,_'move-1'),_0)
('β',_(4,_'write0',_'move-2'),_0) -> ("s'",_(4,_'move-1'),_1) ("s'",_(4,_'move-1'),_0) ("s'",_(4,_'move-1'),_1) ("s'",_(4,_'move-1'),_0)
('C',_(4,_'write0',_'move-2'),_0) -> ('A',_(4,_'move-1'),_1) ('A',_(4,_'move-1'),_0)
('c',_(4,_'write0',_'move-2'),_0) -> ('α',_(4,_'move-1'),_1) ('α',_(4,_'move-1'),_0)
("S'",_(4,_'write0',_'move-2'),_0) -> x ('B',_(4,_'write0',_'move-2'),_0) x
("s'",_(4,_'write0',_'move-2'),_0) -> ('β',_(4,_'write0',_'move-2'),_0) x
('A',_(4,_'move-1'),_0) -> ('C',_(4,_'move-1'),_0)
('α',_(4,_'move-1'),_0) -> ('c',_(4,_'move-1'),_0)
('B',_(4,_'move-1'),_0) -> ("S'",_4,_1) ("S'",_4,_0)
('β',_(4,_'move-1'),_0) -> ("s'",_4,_1) ("s'",_4,_0) ("s'",_4,_1) ("s'",_4,_0)
('C',_(4,_'move-1'),_0) -> ('A',_4,_1) ('A',_4,_0)
('c',_(4,_'move-1'),_0) -> ('α',_4,_1) ('α',_4,_0)
("S'",_(4,_'move-1'),_0) -> x ('B',_(4,_'move-1'),_0) x
("s'",_(4,_'move-1'),_0) -> ('β',_(4,_'move-1'),_0) x
('A',_(4,_'move-1'),_1) -> ('C',_(4,_'move-1'),_1)
('α',_(4,_'move-1'),_1) -> ('c',_(4,_'move-1'),_1)
('B',_(4,_'move-1'),_1) -> ("S'",_4,_1) ("S'",_4,_0) ("s'",_4,_1) ("s'",_4,_0)
('β',_(4,_'move-1'),_1) -> ("s'",_4,_1) ("s'",_4,_0) ("s'",_4,_1) ("s'",_4,_0)
('C',_(4,_'move-1'),_1) -> ('A',_4,_1) ('A',_4,_0)
('c',_(4,_'move-1'),_1) -> ('α',_4,_1) ('α',_4,_0)
("S'",_(4,_'move-1'),_1) -> ('B',_(4,_'move-1'),_1) x
("s'",_(4,_'move-1'),_1) -> ('β',_(4,_'move-1'),_1) x
("S'",_4,_1) -> ('B',_4,_1) x
("s'",_4,_1) -> ('β',_4,_1) x
('β',_(4,_'read1'),_0) -> ("s'",_(4,_'write1',_'move-2'),_1) ("s'",_(4,_'write1',_'move-2'),_0) ("s'",_(4,_'write1',_'move-2'),_1) ("s'",_(4,_'write1',_'move-2'),_0)
('A',_(4,_'write1',_'move-2'),_1) -> ('C',_(4,_'write1',_'move-2'),_1)
('α',_(4,_'write1',_'move-2'),_1) -> ('c',_(4,_'write1',_'move-2'),_1)
('B',_(4,_'write1',_'move-2'),_1) -> ("S'",_(4,_'move-1'),_1) ("S'",_(4,_'move-1'),_0) ("s'",_(4,_'move-1'),_1) ("s'",_(4,_'move-1'),_0)
('β',_(4,_'write1',_'move-2'),_1) -> ("s'",_(4,_'move-1'),_1) ("s'",_(4,_'move-1'),_0) ("s'",_(4,_'move-1'),_1) ("s'",_(4,_'move-1'),_0)
('C',_(4,_'write1',_'move-2'),_1) -> ('A',_(4,_'move-1'),_1) ('A',_(4,_'move-1'),_0)
('c',_(4,_'write1',_'move-2'),_1) -> ('α',_(4,_'move-1'),_1) ('α',_(4,_'move-1'),_0)
("S'",_(4,_'write1',_'move-2'),_1) -> ('B',_(4,_'write1',_'move-2'),_1) x
("s'",_(4,_'write1',_'move-2'),_1) -> ('β',_(4,_'write1',_'move-2'),_1) x


This 2-tag system, together with the input string $\texttt{0}$, are encoded into a string with 6.5 million symbols. Running the UTM is very slow. I had to write a special simulator that enables it to "fast-forward" in certain cases, and found that it accepts after 46 trillion moves.

The real numbers are uncountable¶

Now we shift for the moment to a totally different topic, the uncountability of the real numbers. The concept of an uncountable infinity, and the associated proof technique of diagonalization, was discovered by Georg Cantor. Unfortunately, it also drove him crazy.

Read subsection "The Diagonalization Method," from page 202 to the table at the top of page 206.

Watch W9E2: Diagonalization.

Suppose that there are countably many real numbers. Then there is a way to enumerate all the real numbers in $[0,1)$. We can write the digits after the decimal point in a table like this:

$x_1$ .1 4 1 5 $\cdots$
$x_2$ .5 5 5 5 $\cdots$
$x_3$ .1 2 3 4 $\cdots$
$x_4$ .5 0 0 0 $\cdots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$

Then we can form a new real number by taking the diagonal and changing every digit (but avoid 0 and 9, as explained in the book):

$x_1$ .1 4 1 5 $\cdots$
$x_2$ .5 5 5 5 $\cdots$
$x_3$ .1 2 3 4 $\cdots$
$x_4$ .5 0 0 0 $\cdots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$
$x'$ .2 6 4 1 $\cdots$

This new number cannot be equal to any row of the table, which is a contradiction.

There are uncountably many languages¶

We can adapt Cantor's argument for the uncountability of the reals to an argument for the uncountability of the set of all languages. Our proof is different from the book's proof of Corollary 4.18 (page 206), and I think it should be okay just to read ours.

Suppose that there are countably many languages over a finite alphabet $\Sigma$. Then we can number them $L_1, L_2, L_3, \ldots$.

We can number all strings in $\Sigma^\ast$ in shortlex order. Call the $j$th string in this ordering $w^{(j)}$. (We use this notation to avoid confusion with the notation $w_j$ for the $j$th symbol of $w$.)

Imagine a big table whose $i$th row is $L_i$ and whose $j$th column is $w^{(j)}$, and cell $(i,j)$ says whether $w^{(j)} \in L_i$. For illustration's sake, we assume $\Sigma = \{\mathtt{0}, \mathtt{1}\}$.

$\varepsilon$ $\mathtt{0}$ $\mathtt{1}$ $\mathtt{00}$ $\cdots$
$L_1$ no yes no yes $\cdots$
$L_2$ no yes yes no $\cdots$
$L_3$ yes no yes no $\cdots$
$L_4$ no no no yes $\cdots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$

Then we can form a new language by taking all the diagonal entries of this table and inverting them (yes to no and no to yes):

$\varepsilon$ $\mathtt{0}$ $\mathtt{1}$ $\mathtt{00}$ $\cdots$
$L_1$ no yes no yes $\cdots$
$L_2$ no yes yes no $\cdots$
$L_3$ yes no yes no $\cdots$
$L_4$ no no no yes $\cdots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$
$L'$ yes no no no $\cdots$

More formally, $$L' = \{ w^{(i)} \mid w^{(i)} \not\in L_i \}.$$

This language $L'$ cannot be equal to any row of the table, because for any $i$, either $w^{(i)} \in L'$ but $w^{(i)} \not\in L_i$, or $w^{(i)} \not\in L'$ but $w^{(i)} \in L_i$.

But the table supposedly contains all possible languages, which is a contradiction. Therefore there are uncountably many languages.

Thursday¶

Today's topic is undecidability: is there such a thing as a language that can't be decided by a Turing machine (and therefore, by any computer program as we know it)?

The answer is yes, and not only so, but almost all languages are undecidable.

There is an undecidable language¶

Since Turing machines can be encoded as strings, there is a way to number all possible Turing machines $M_1, M_2, M_3, \ldots$ (e.g., sort their encodings in shortlex order and then number them consecutively). It follows that there are languages (in fact, almost all languages) which cannot be decided by any Turing machine. But we'd like a proof that explicitly constructs such a language.

Imagine a big table whose $i$th row is $M_i$ and whose $j$th column is $w^{(j)}$, and cell $(i,j)$ says whether $w^{(j)} \in \mathcal{L}(M_i)$ (that is, it says "yes" if $M_i$ accepts $w^{(j)}$, but "no" if $M_i$ rejects or loops on $w^{(j)}$.)

$\varepsilon$ $\mathtt{0}$ $\mathtt{1}$ $\mathtt{00}$ $\cdots$
$M_1$ no yes no yes $\cdots$
$M_2$ no yes yes no $\cdots$
$M_3$ yes no yes no $\cdots$
$M_4$ no no no yes $\cdots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$

We can again form a new language by taking all the diagonal entries of this table and inverting them:

$$L_D = \{w^{(i)} \mid \text{M_i does not accept w^{(i)}}\}$$

And this language again cannot be equal to any row of the table, because for any $i$, either $w^{(i)} \in L_D$ but $M_i$ does not accept $w^{(i)}$, or $w^{(i)} \not\in L_D$ but $M_i$ accepts $w^{(i)}$.

This means that there is no such thing as a Turing machine that decides $L_D$. (In fact, there isn't even one that recognizes it.)

The undecidable language $A_{\mathsf{TM}}$¶

The book's first example of an undecidable language is

$$A_{\mathsf{TM}} = \{ \langle M, w \rangle \mid \text{M accepts w} \}.$$

This language will be very useful for proving other undecidable languages. We can prove $A_{\mathsf{TM}}$ undecidable by continuing our proof above. Suppose that there exists a decider $H$ that decides $A_{\mathsf{TM}}$.

Then we would be able to compute the value of any cell of the above table since $H$ is supposedly a decider. But that would enable us to define a decider for $L_D$. Namely, define a TM $D$ that, on input $w$,

1. Find $i$ such that $w = w^{(i)}$.
2. Construct machine $M_i$.
3. Run $H$ on input $\langle M_i, w^{(i)} \rangle$.
4. If $H$ accepts, reject, and if $H$ rejects, accept.

But we know this is impossible, either by using the undecidability of $L_D$ as shown above, or we can repeat the argument again. If $D$ exists, then for some $i$, $D = M_i$. If $D$ accepts $w^{(i)}$, then (by the definition of $D$) we know that $H$ rejects $\langle D, w^{(i)}\rangle$, but (by the definition of $H$) it was supposed to accept. Similarly, if $D$ rejects $w^{(i)}$, then (by the definition of $D$) we know that $H$ accepts $\langle D, w^{(i)}\rangle$, but (by the definition of $H$) it was supposed to reject. Therefore, $D$ cannot exist, and $A_{\mathsf{TM}}$ is undecidable.

Read: If the above isn't clear, try the book's proof on pages 207–209.

The halting problem¶

The rest of this notebook can be read after Thursday's class.

Most books use the following language as their first undecidable language:

$$\mathit{HALT}_{\mathsf{TM}} = \{ \langle M, w \rangle \mid \text{M halts on w} \}.$$

This is probably the most well-known undecidable problem: Can you write a program that looks at another program $M$ and input $w$ and decides whether $M$ halts or loops on $w$?

Question: Adapt the above diagonalization argument to prove that $\mathit{HALT}_{\mathsf{TM}}$ is undecidable.

For fun, you can read Geoff Pullum's "Scooping the Loop Snooper: A proof that the Halting Problem is undecidable."

The following table may help. Unlike our proof above (and like Sipser's), the columns of the table are not the strings in shortlex order, but the encodings of $M_1, M_2, \ldots$.

$M_1$ $M_2$ $M_3$ $M_4$ $\cdots$ $Q$ $\cdots$
$M_1$ bad! good! bad! good! $\cdots$ good! $\cdots$
$M_2$ bad! good! good! bad! $\cdots$ bad! $\cdots$
$M_3$ good! bad! good! bad! $\cdots$ good! $\cdots$
$M_4$ bad! bad! bad! good! $\cdots$ good! $\cdots$
$\vdots$ $\vdots$ $\vdots$ $\vdots$ $\vdots$ $\ddots$ bad! $\cdots$
$Q$ good! bad! bad! bad! $\cdots$ ??? $\cdots$

Read the subsection "A Turing-Unrecognizable Language."

A first look at reducibility¶

Now that we've proven that one language ($A_{\mathsf{TM}}$) is undecidable, we can use it to prove that other languages are undecidable.

Read pages 215–217, up to "via the diagonalization method."

The first example is the halting problem $\mathit{HALT}_{\mathsf{TM}}$. In many textbooks (and the poem "Scooping the Loop Snooper"), the halting problem is actually the prototypical undecidable language, but Sipser does things a little differently.

To prove that the halting problem is undecidable, you assume that it is decidable, that is, there is a TM $R$ that decides it. Then, you show that armed with such a TM, you could implement another TM, $S$, that decides $A_{\mathsf{TM}}$, which is a contradiction because we know that $A_{\mathsf{TM}}$ is undecidable.

Remember that the direction of the reduction is the opposite of what most people intuitively think of first. If you want to show that the halting problem is undecidable, you do not reduce the halting problem to $A_{\mathsf{TM}}$; you reduce $A_{\mathsf{TM}}$ to the halting problem. To avoid confusion (and, you will see that the potential for confusion grows below), we give a nickname to each TM to help you remember which is which. Call $R$ the "loop snooper," in homage to Pullum, and $S$ the "universal decider" (because it's a universal TM but it always halts).

So, suppose that we had a TM $R$ (the loop snooper) that decides the halting problem $\mathit{HALT}_{\mathsf{TM}}$. Then, designing a universal decider $S$ would be easy: $S =$ "On input $\langle M, w\rangle$,

1. Use the loop snooper $R$ to check whether $M$ loops on $w$.
2. If a loop is detected, reject.
3. If no loop is detected, we can safely simulate $M$ on $w$.
4. If it accepts, accept.
5. If it rejects, reject.

But last time we showed (by diagonalization) that the universal decider $S$ does not exist. Therefore, the loop snooper $R$ cannot exist either.

The Penrose-Lucas argument (optional)¶

Turing machines were invented as a model of what it means for humans, not computing machines, to compute, and so a natural question is, can Turing machines serve as a model for all human reasoning? I want to present to you one argument for the "no" position. Turing called it "the mathematical objection" and it is usually known today as the Penrose-Lucas argument. This version, which is an interesting variation on the diagonalization argument for the undecidability of the halting problem, is due to Penrose and comes from an article criticizing him. Although I agree with the conclusion, I don't agree with the argument -- what do you think about it?

Let $R$ be a "partial loop-snooper," a decider TM that detects some cases of looping. That is, on input $\langle M, w\rangle$, if $R$ accepts, then $M$ loops on input $w$. But if $R$ rejects, it doesn't mean anything.

Now we go through the usual diagonalization argument; the only difference is that it doesn't lead to a contradiction, but to a machine/input pair that is beyond $R$'s detection abilities. Assume an ordering $M_1, M_2, \ldots$ on Turing machines and an ordering $w^{(1)}, w^{(2)}, \ldots$ on strings. We can build a big table with the results of $R$ on all machines and inputs:

$\varepsilon$ $\mathtt{0}$ $\mathtt{1}$ $\mathtt{00}$ $\cdots$
$M_1$ don't know loops don't know loops
$M_2$ don't know loops loops don't know
$M_3$ loops don't know loops don't know
$M_4$ don't know don't know don't know loops
$\vdots$

We define $D$ to be the Turing machine that does the opposite of the diagonal of this table. That is, on input $w$:

1. Find $i$ such that $w = w^{(i)}$.
2. Run $R$, the partial loop-snooper, on $\langle M_i, w^{(i)}\rangle$.
3. If $R$ detected an infinite loop, accept.
4. Otherwise, go into an infinite loop.

Now, there must be an $i$ such that $D = M_i$. We claim that $D$ loops on $w^{(i)}$ but $R$ does not detect it.

• For if $R$ accepts $\langle D, w^{(i)}\rangle$, then $D$ must not loop on $w^{(i)}$, but that would be a contradiction.
• But if $R$ rejects $\langle D, w^{(i)}\rangle$, then $D$ must loop on $w^{(i)}$, which is not a contradiction.

So, in fact, $D$ does loop on $w^{(i)}$, but $R$ does not detect it.

It's critical that you understand the argument thus far before moving on.

Suppose that you are equivalent to a Turing machine. You have a partial ability to detect looping. So it should be possible to construct a Turing machine $Y$ that accepts $\langle M, w\rangle$ in exactly those cases when you are able to detect that $M$ loops on input $w$.

Then, by the above argument, there is a machine/input pair $\langle D, w^{(i)}\rangle$ that loops, but $Y$ rejects it. By the definition of $Y$, you are not able to detect that $\langle D, w^{(i)}\rangle$ loops. But if you understood the above argument, then you know that $\langle D, w^{(i)}\rangle$ does loop. This is a contradiction! Therefore, congratulations! You are not equivalent to a Turing machine.