Solar Nomad

October 30, 2024

In my post about esolangs I mentioned Turing machines and gave a quick overview of how they work. Out of curiosity I decided to write a simulation of one in C - in this post I'll go into detail about the implementation of the machine itself as well as a custom language (and parser) I wrote for programming it.

The basics

A Turing machine is more of a mathematical construct than an actual physically implementable machine, and Wikipedia can describe its formal definition better than I can:

For the sake of simplicity we'll deviate slightly from this definition. The set of tape alphabet symbols and that of input symbols will just be the set of ASCII characters, and we'll allow the blank symbol to be included in the input.

The first thing we'll implement is the machine's overall structure. We need a tape, a pointer to the selected cell on the tape, and a list of possible states.


#define TAPE_SIZE 10000
#define MAX_ID_LEN 50

typdef struct State {
	int id;
	char name[MAX_ID_LEN];
	int num_instructions;
	Instruction instructions[255]; // 255 ASCII characters
} State;

typedef struct Machine {
	char tape[TAPE_SIZE];
	unsigned int pointer;
	State **states;
	char **state_ids; // Mapping state names to IDs so we can find them via indexing
	int initial_state;
	int num_states;
} Machine;

We want to be able to program the machine too, so we'll need a format for storing instructions. Each instruction will have a "read" character, a "write" character, a shift direction, and a value defining which state to move to.


typedef enum Shift { LEFT, RIGHT, NONE } Shift;

typedef struct Instruction {
	char read; // If the current cell contains this value...
	char write; // Replace it with this value...
	Shift shift; // Move in this direction...
	char next_state_name[MAX_ID_LEN]; // And go to this state
	int next_state;
} Instruction;

Machine operation

Simulating the machine is remarkably simple.

1. Copy input to tape, place pointer on first character of input
2. Set the current state to the initial state
3. While the current state is NOT the halt state:
a. Get the value of the current cell and look for an instruction with a corresponding read value
b. If there is no valid instruction, halt
c. Replace the current cell's value with the write value
d. Move the pointer in the chosen direction
e. Change the current state to the next state determined by the instruction
4. Halt

Here's a basic example of how this could be implemented in C:


for (int i = 0; i < TAPE_SIZE; i++)
m->tape[i] = BLANK; // BLANK is defined as the underscore '_'

// Copy input to tape
int i = 0;
while (*input != '\0')
	m->tape[m->pointer + i++] = *input++;

State* curr_state = m->states[m->initial_state]; // m is a pointer to a Machine
// get_next_instruction iterates through the array of instructions and selects the one with the given read value
Instruction* instruction = get_next_instruction(curr_state, m->tape[m->pointer]);

while (instruction != NULL)
{
	if (instruction->shift == LEFT)
		m->pointer--;
	else if (instruction->shift == RIGHT)
		m->pointer++;

	if (instruction->next_state == -1) // Halt state
		break;

	curr_state = m->states[instruction->next_state];
	instruction = get_next_instruction(curr_state, m->tape[m->pointer]);
}

Yep, that's it. You now have a working, programmable Turing machine simulator. I've left out some potentially critical aspects such as bounds checking and memory deallocation (feel free to check the GitHub repo for more details), but this is essentially the core of the program, and it's only 20-something lines long.

Implementing a programming language

The obvious downside to this implementation is the fact that programming a machine requires hardcoding every state and instruction. I've devised a dead-simple programming language with a very simple syntax to alleviate this:


state_name {
	read_value: write_value shift_direction next_state;
	...
}

In the spirit of C (and to keep parsing complexity to a minimum), we're using curly braces to group instructions inside states and semicolons to end them.

Writing a parser is somewhat out of the scope of this post, so I'll keep things short. Before parsing comes the lexical tokenization stage, where we move through the input program character by character, splitting it into various kinds of tokens to make the parser's life a little easier.

We'll convert


state_3 {
	0: 1 > state_3;
	1: 1 = HALT;
}

into the more digestible


IDENTIFIER state_3
OPEN_BRACKET
VALUE 0
COLON
VALUE 1
DIRECTION >
IDENTIFIER state_3
SEPARATOR
VALUE 1
COLON
VALUE 1
DIRECTION =
IDENTIFIER HALT
SEPARATOR
CLOSE_BRACKET

Tokens are stored in a Token struct, containing an enum Type and an (optional) character array. The array of Tokens is then passed to the parser itself. The one I've written is primitive, with extremely bare-bones error checking, but it gets the job done, iterating through the tokens and constructing States and Instructions:


State *first = states; // states is a pre-allocated array
int num_states = 0;
bool in_state = false;

Token *t = tokenize(raw_text);
while (t->type != NA)
{
	if (t->type == IDENTIFIER)
	{
		if (in_state)
		{
			fprintf(stderr, "missing close bracket\n");
			return 0;
		}

		in_state = true;
		strcpy(states->name, t->contents);

		t++;
		if (t->type != OPEN_BRACKET)
		{
			fprintf(stderr, "missing open bracket\n");
			return 0;
		}
		t++;
	}
	else if (t->type == VALUE)
	{
		// collect instruction tokens and add them to the current state's instructions array
		// ...
	}
	else if (t->type == CLOSE_BRACKET)
	{
		if (in_state)
		{
			in_state = false;
			states++;
			num_states++;
			t++;
		}
		else
		{
			fprintf(stderr, "bracket mismatch\n");
			return 0;
		}
	}
}

Putting it together

There's a few things left to do. Each state has its user-provided name, but we also want to assign each one a unique internal ID so we can find states by indexing the machine's state array instead of iterating through all of it until we find the right one.

I also divided the program into neat functions so we can have a nice and simple main file:


char *filename = argv[1];
char input[MAX_INPUT_LEN];
parse_input(argv[2], input);

State *states = malloc(100 * sizeof(State));
int num_states = parse(filename, states);

Machine machine;
init_machine(&machine, input, states, num_states);

int result = run_machine(&machine);
if (result < 0)
	printf("Program halted: FAILURE\n");
else
{
	printf("Program halted: SUCCESS (reached halt state after %d steps)\n", result);
	print_final_state(machine.tape, machine.pointer);
}

Conclusion

This was a very high-level overview of how I implemented a Turing machine simulator in C. It was a very fun learning experience and I recommend trying to fill in the blanks yourself.

The full version of the program is available on GitHub and includes some additional features such as wildcard values, comments, and even tape animation. There's something weirdly engaging in watching the pointer move back and forth along the tape flipping cell values. Go give it a shot!

Simulating Turing machines in C

The basics

Machine operation

Implementing a programming language

Putting it together

Conclusion