Black box compiler phases considered harmful

Stop throwing away perfectly usable information!

isolating information is wasteful

Have you ever thought about how much information is thrown away in phase-based compilation?

One example that comes to mind is the minimal solution for local bindings: maintaining a map-like structure (often called the environment) where bindings are inserted and removed as the compiler traverses in and out of AST nodes.

While this is a completely valid solution, the information produced by the compiler is completely local to the traversal being performed and cannot be used anywhere else. Wouldn’t it be nice if the compiler could also use this information to provide autocomplete for local bindings?

Eventually, you may arrive at scope graphs as the preferred solution. Scope graphs encode name resolution rules independently of the syntax tree, making it perfect for use within and outside of compiler phases.

attaching semantic information to syntax

Using structures that can be used outside compiler phases is just the beginning. Compilers can go the extra kilometer and attach semantic information onto syntax nodes directly to make semantic information easier to get to.

To enumerate a few examples: during type inference, the compiler can attach type information to expressions; during name resolution, the compiler can attach the associated scope graph nodes to an expression.

conclusion

End users will treat a compiler as a black box, and that’s fine. Given the expectation that compilers need to provide meaningful editor integrations though, I think it’s more important than ever to design them around language-server-style use.

I highly recommend watching this gem of a YouTube video: Anders Hejlsberg on Modern Compiler Construction.