Mugo, a toy compiler for a subset of Go that can compile itself

https://benhoyt.com/writings/mugo/

Summary: This article presents Mugo, a single-pass compiler for a tiny subset of the Go programming language. It outputs (very naive) x86-64 assembly, and supports just enough of the language to implement a Mugo compiler: int and string types, slices, functions, locals, globals, and basic expressions and statements. Go to: Which subset? | Codegen | Lexer and Parser | Performance | Related projects

I’ve been fascinated with compilers since I started coding. One of my first programming projects was “Third”, a self-hosted Forth compiler for 8086 DOS. Forth is incredibly easy to compile: there are no expressions or statements, and each space-separated token gets compiled directly to a call instruction – often via techniques like direct threading.

Typical languages like C and Go have more complex syntax with expressions and statements, so they require real parsers and code generators. Compilers for these languages are usually complex and powerful, but as we’ll see, if you stick to basic types and non-optimized output, you can still write a simple one.

Mugo is kind of in the spirit of the Obfuscated Tiny C Compiler by Fabrice Bellard, though of course mine’s far more pedestrian, and won’t be winning the IOCCC anytime soon. Bellard’s compiler implements just enough C to compile itself to a native i386 Linux executable.

I wanted to do something a bit like that with Go, minus the obfuscation. The idea started as a shower thought: “I wonder what’s the smallest subset of Go that could compile itself?” Fabrice’s C compiler is implemented in 2048 bytes of obfuscated C, whereas mine is 1600 lines of formatted Go.

While this was a fun exercise to do over a long weekend, it’s very much a toy – it leaves out all the great features of Go: user-defined types, interfaces, goroutines, channels, maps, garbage collection, even bounds checking! My goal with Mugo was educational: for me, and hopefully also for you. Doing exercises like this helps demystify how our tools work.

Which subset of Go?

Mugo is a subset of Go, so the source code can be compiled with Go as well as with Mugo. In my opinion, this makes it much more interesting. It also made it easier to test: when the assembly output of the Go-compiled version was identical to the output of the Mugo-compiled version, I knew it was working – it was a beautiful thing when diff mugo2.asm mugo3.asm showed no output!

Before I started, I mulled over which subset of features I should include. I knew I’d need some kind of container type to store compiler state: for example, the names and types of variables, and function signatures and return types. But which container?

Go has pointers, and they’re safer but not nearly as powerful as C’s, because you can’t do pointer arithmetic. Bellard’s compiler makes heavy use of C pointers, but those weren’t going to work in Go.

What about structs or maps? Well, those were going to be more complex to implement, and don’t really solve the most common problem of storing a list of things. So I decided I could do without all those, and just needed slices.

Here’s what Mugo supports:

The int type, decimal integer literals, character constants, and most expressions that operate on int: +, , , /, %, ==, !=, <, <=, >, and >=, with operator precedence handled as per Go. The compiler recognizes the type name bool but treats it identically to int (&&, ||, and ! operate on these pseudo-bools).
The string type, including string constants with \\ escapes, string equality tests using == or !=, string concatenation with +, and len().
Slices, but only []int and []string. Slice literals and make() are not supported, so to build a slice you have to create an empty slice and append to it. Fetching and assigning slice items is supported, as are slice[:n] expressions and len().
Type checking is present but incomplete. I check types in places where it made sense or where it helped with debugging, but it’s certainly not exhaustive.
Statements: if and else, for condition { ... }, return, and Go’s := short variable declaration.
Variables and constants. However, var and const are only supported at the top level; you have to use := for locals (which is more common in Go anyway). Only typed integer constants are supported.