https://benhoyt.com/writings/mugo/

Summary: This article presents Mugo, a single-pass compiler for a tiny subset of the Go programming language. It outputs (very naive) x86-64 assembly, and supports just enough of the language to implement a Mugo compiler: int and string types, slices, functions, locals, globals, and basic expressions and statements. Go to: Which subset? | Codegen | Lexer and Parser | Performance | Related projects

I’ve been fascinated with compilers since I started coding. One of my first programming projects was “Third”, a self-hosted Forth compiler for 8086 DOS. Forth is incredibly easy to compile: there are no expressions or statements, and each space-separated token gets compiled directly to a call instruction – often via techniques like direct threading.

Typical languages like C and Go have more complex syntax with expressions and statements, so they require real parsers and code generators. Compilers for these languages are usually complex and powerful, but as we’ll see, if you stick to basic types and non-optimized output, you can still write a simple one.

Mugo is kind of in the spirit of the Obfuscated Tiny C Compiler by Fabrice Bellard, though of course mine’s far more pedestrian, and won’t be winning the IOCCC anytime soon. Bellard’s compiler implements just enough C to compile itself to a native i386 Linux executable.

I wanted to do something a bit like that with Go, minus the obfuscation. The idea started as a shower thought: “I wonder what’s the smallest subset of Go that could compile itself?” Fabrice’s C compiler is implemented in 2048 bytes of obfuscated C, whereas mine is 1600 lines of formatted Go.

While this was a fun exercise to do over a long weekend, it’s very much a toy – it leaves out all the great features of Go: user-defined types, interfaces, goroutines, channels, maps, garbage collection, even bounds checking! My goal with Mugo was educational: for me, and hopefully also for you. Doing exercises like this helps demystify how our tools work.

Which subset of Go?

Mugo is a subset of Go, so the source code can be compiled with Go as well as with Mugo. In my opinion, this makes it much more interesting. It also made it easier to test: when the assembly output of the Go-compiled version was identical to the output of the Mugo-compiled version, I knew it was working – it was a beautiful thing when diff mugo2.asm mugo3.asm showed no output!

Before I started, I mulled over which subset of features I should include. I knew I’d need some kind of container type to store compiler state: for example, the names and types of variables, and function signatures and return types. But which container?

Go has pointers, and they’re safer but not nearly as powerful as C’s, because you can’t do pointer arithmetic. Bellard’s compiler makes heavy use of C pointers, but those weren’t going to work in Go.

What about structs or maps? Well, those were going to be more complex to implement, and don’t really solve the most common problem of storing a list of things. So I decided I could do without all those, and just needed slices.

Here’s what Mugo supports: