Writing an Assembler: How and Why?

27/04/2025

About 2 months ago I discovered Urbit.

It's a weird little (kinda unix-like?) operating system written in about 30,000 LOC, it isn't based on any existing code such as a Linux or *BSD kernel and it's involved in some weird crypto stuff for some reason — but that's besides the point.

What interested me about the operating system was how it was implemented, the underlying language, Nock.

What's a Nock?

As the team behind Urbit puts it, Nock is a "low-level functional programming language" or a "Turing-complete function that maps a cell [subject formula] to a noun product, where subject is the data, formula is the code, and product is the result." but let's not worry about that second definition...

The part that interested me about Nock was how tiny some implementations of the language are.

Here are some statistics:

This by itself is crazy enough but there are already tiny implementations of other languages such as Brainfuck but what sets Nock apart is that it is not a toy. This language being the base of an entire operating system shows how the power of a language is not correlated to its size at all.

The idea of a Nock to me represents a tiny (yet extremely powerful) low-level systems programming language. So I wanted to bulid my own.

Designing My Own "Nock"

Here are some ground rules for my language:

  1. Syntax should be as simple as possible
  2. Compiled binary should be as small as possible
  3. Code should compile to an ELF binary

Firstly, I defined the syntax of this language, the simplest possible code I could think off was an array of hexadecimal bytes corrosponding to the bytes in the final binary, for example:

hello.asm

48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 0a ; "Hello, world\n"

This would make my language lean closer to machine code than assembly but the whole point of this is to be low level.

Secondly, to create the smallest possible binary theres only one obvious choice for my programming language, that being assembly.

The assembler I decided to use was FASM, which I discovered from a Tsoding VOD and opened my eyes to the world of small executables.

This final rule is here just because I was too lazy to add any other build options but Linux.

As a bonus challenge I built the entire assembler with zero statically or dynamically linked dependencies (including libc).

Actually Building the Thing