Tree Walking Interpretation
What Is Tree-Walking Interpretation?
A tree-walking interpreter evaluates a program by recursively traversing its abstract syntax tree and computing results at each node. When it encounters a + b, it recursively evaluates a to get a value, recursively evaluates b to get a value, then adds them. When it encounters if c then t else e, it evaluates c, checks whether it is true, and recursively evaluates either t or e. The evaluation function’s structure mirrors the grammar’s structure — one case per syntactic construct, recursive calls for sub-expressions.
This approach dates to McCarthy’s original 1960 definition of Lisp, where eval was a recursive function over S-expressions. It remains the standard way to define a language’s semantics precisely — the definitional interpreter approach. Denotational semantics, operational semantics, and interpreter-based language specifications all describe this same recursive structure. When a language specification says “the value of if e1 then e2 else e3 is the value of e2 if e1 evaluates to true, otherwise the value of e3,” it is describing exactly what a tree-walking interpreter does.
The Execution Strategy Spectrum
Tree-walking sits at one end of a spectrum of execution strategies. Understanding the full spectrum clarifies why Ori chose tree-walking for development and AOT compilation for production:
| Strategy | How It Works | Speed | Startup | Examples |
|---|---|---|---|---|
| Tree-walking | Recursive dispatch on AST nodes | Slowest | Instant | Ori (dev), early Ruby, Roc REPL |
| Bytecode VM | Compile to bytecodes, dispatch loop | 5-20x faster | Fast | CPython, Lua, Ruby 1.9+, Erlang |
| JIT compilation | Compile hot paths to machine code at runtime | 10-100x faster | Warm-up delay | V8, JVM HotSpot, LuaJIT, PyPy |
| AOT compilation | Compile entire program to machine code ahead of time | Fastest | Compile step | Ori (prod), GCC, Rust, Go, Zig |
Each step down the table trades simplicity and startup speed for steady-state performance. Tree-walking interpreters are the simplest to build and maintain but the slowest to execute. AOT compilers produce the fastest executables but require a full compilation step. Bytecode VMs and JITs sit between, balancing complexity and performance.
Ori bypasses the middle of this spectrum: tree-walking for development (where instant startup and debuggability matter more than throughput), AOT via LLVM for production (where maximum performance matters). This avoids the engineering cost of designing a bytecode instruction set and maintaining a bytecode compiler — complexity that would deliver only modest improvements for the development workflow.
Why Not a Bytecode VM?
A bytecode VM would provide 5-20x speedup over tree-walking. For Ori’s development use case, this speedup rarely matters:
- Test suites are typically I/O-bound (file loading, Salsa caching) rather than CPU-bound
- Type checking dominates compile time; evaluation is a small fraction
- The REPL and playground evaluate short programs where interpretation overhead is imperceptible
- Production performance comes from LLVM, which provides 100-1000x speedup over tree-walking — making a bytecode VM an intermediate step with limited value
The engineering cost of a bytecode VM is substantial: instruction set design, a compiler from CanExpr to bytecodes, a dispatch loop (switch-threaded, direct-threaded, or computed-goto), a value stack, and all the associated debugging infrastructure. This cost buys performance improvements primarily for long-running interpreted programs — a use case Ori handles with AOT compilation instead.
Ori’s Tree-Walking Design
The Entry Point: eval_can
All evaluation flows through a single entry point:
impl Interpreter<'_> {
pub fn eval_can(&mut self, can_id: CanId) -> EvalResult {
ensure_sufficient_stack(|| self.eval_can_inner(can_id))
}
}
The ensure_sufficient_stack wrapper (from the stacker crate) checks that the native thread stack has enough space for the recursive call. If the stack is running low, it allocates a new segment. This prevents stack overflows on deeply nested expressions — a real concern for tree-walking interpreters, which use the host language’s call stack as the evaluation stack.
The Copy-Out Pattern
The inner dispatch function uses a critical pattern that deserves detailed explanation:
fn eval_can_inner(&mut self, can_id: CanId) -> EvalResult {
let kind = *self.canon_ref().arena.kind(can_id);
match kind {
CanExpr::Int(n) => Ok(Value::int(n)),
// ... 50+ variants
}
}
The first line copies the CanExpr value out of the arena before dispatching. This is not an accident — it is essential for correctness. Here is why:
self.canon_ref() borrows self immutably to access the canonical IR. If we matched directly on self.canon_ref().arena.kind(can_id), the immutable borrow on self would persist for the entire match body. But each match arm needs to call self.eval_can() recursively, which requires a mutable borrow on self. Holding an immutable and mutable borrow simultaneously violates Rust’s borrowing rules.
The copy-out pattern resolves this: copy the 24-byte CanExpr value (which is Copy) onto the stack, immediately releasing the immutable borrow on self. Now each match arm can freely call self.eval_can() and other &mut self methods. The 24-byte copy is trivially cheap — it fits in three 64-bit registers.
This pattern is Ori-specific but the underlying problem is universal. Any tree-walking interpreter written in a language with strict aliasing or ownership rules (Rust, C++ with strict aliasing, linear types) must solve the “read the tree, then mutate the evaluator” tension. Copying the dispatch tag out of the tree is the simplest and cheapest solution.
Exhaustive Dispatch
The eval_can_inner function matches exhaustively on all CanExpr variants — there is no _ => catch-all arm. This is a deliberate safety measure: when a new variant is added to CanExpr (during language evolution), the Rust compiler forces every match site to handle it. A catch-all arm would silently pass through unhandled variants, potentially producing wrong results rather than compile errors.
The dispatch covers roughly 54 variants, organized by category:
Literals — direct value construction, no recursion:
Int(i64),Float(u64),Bool(bool),Str(Name),Char(char),UnitDuration { value, unit },Size { value, unit }— unit conversion at evaluation timeConstant(ConstId)— lookup in the pre-computedConstantPool
References — environment lookup or type reference:
Ident(name)— variable lookup in the scope chainConst(name)— immutable constant referenceTypeRef(name)— resolved type reference (for associated function calls likePoint.origin())FunctionRef(name)— named function referenceSelfRef—selfin method bodies
Operators — dual dispatch (primitives direct, user types via traits):
Binary { left, op, right }— delegated toeval_can_binary()Unary { op, operand }— negation, logical not, bitwise notCast { expr, target, fallible }— type casts (as/as?)
Control flow — delegated to control_flow.rs:
If { cond, then, else_ }— conditional branchingBlock(range)— evaluate statements in order, return lastLoop { body }— infinite loop withbreakvalueFor { binding, iter, body, yields }— iterator-based loopWhile { cond, body }— condition-checked loopMatch { scrutinee, tree_id }— decision tree evaluation
Calls and methods:
Call { func, args }— evaluate callee and arguments, then dispatchMethodCall { receiver, method, args }— method dispatch chain
Collections — evaluate elements into containers:
List(range),Map(entries),Tuple(range)
Bindings and access:
Let { pattern, value, .. }— pattern destructuring withbind_can_pattern()FieldAccess { expr, field },Index { expr, index },Range { start, end, step, inclusive }StructLit,Lambda,Assign
Error handling and capabilities:
Try(expr)— error propagation (?operator)Unsafe(expr)— transparent wrapper (evaluates inner expression)WithCapability { capability, provider, body }—with Cap = x in body
Decision Tree Evaluation
Match expressions do not use sequential arm testing. The canonicalization phase compiles patterns into decision trees using the Maranget 2008 algorithm. The evaluator walks these trees:
- Leaf — binds captured variables and evaluates the arm body
- Guard — evaluates a guard expression; on failure, falls through to
on_fail - Switch — tests the scrutinee at a path (enum tag, bool value, list length) and follows the matching edge
- Fail — non-exhaustive match error (should be caught by exhaustiveness checking at compile time)
This approach avoids the O(patterns × depth) worst case of sequential testing. Each switch node splits the search space, and the tree’s depth is bounded by the pattern depth rather than the number of arms.
Function Call Evaluation
Function calls follow a consistent protocol:
- Evaluate the callee to get a callable value (
Function,FunctionVal,VariantConstructor,NewtypeConstructor) - Evaluate all arguments — canonical IR has already reordered named arguments to positional form
- Push a call frame onto the call stack (for recursion depth tracking and backtrace capture)
- Create a child interpreter with the callee’s arena (arena threading for cross-module calls)
- Bind parameters in the child’s environment
- Evaluate the body via
eval_can(body_can_id) - Pop the call frame (via RAII
ScopedInterpreter— guaranteed even on panic)
The child interpreter creation is the most distinctive step. Each FunctionValue carries a SharedArena reference to the arena where its body expressions live. The evaluator creates a new Interpreter bound to that arena, inheriting the user method registry (via SharedMutableRegistry) but with its own scope stack. This ensures expression ID lookups always hit the correct arena.
Literal Evaluation
Literals in canonical IR are already desugared — no Literal enum wrapper. Each literal type is a direct CanExpr variant with the value inline:
CanExpr::Int(i64)→Value::int(n)— wraps inScalarIntfor checked arithmeticCanExpr::Float(u64)→Value::Float(f64::from_bits(bits))— stored as bit pattern for Salsa compatibilityCanExpr::Str(Name)→Value::string_static(interned_str)— zero-copy from the interner viaCow::BorrowedCanExpr::Constant(ConstId)→ looked up in theConstantPool(pre-folded at compile time by constant folding)
The float bit-pattern encoding is worth noting: floats are stored as u64 in the canonical IR because f64 does not implement Eq or Hash (due to NaN), which would prevent CanExpr from being used in Salsa’s query system. The from_bits conversion at evaluation time is a no-op on all modern architectures.
Binary Operator Dispatch
Binary operators use a tiered dispatch strategy:
- Short-circuit first:
&&and||evaluate the left operand, and skip the right if the result is determined - Comparison operators:
==,!=,<,<=,>,>=use direct evaluation for primitives - Mixed-type operations:
int * Duration,int * Size— detected and handled directly because primitives do not implement operator trait methods - Primitive arithmetic:
int + int,float * float, etc. — direct evaluation without method lookup - User-type operators: if none of the above applies, dispatch through operator trait methods (
Add::add, etc.)
This tiered approach avoids the overhead of method resolution for the common case (primitive operations) while supporting user-defined operator overloading for custom types.
Prior Art
Roc’s eval module is the closest structural analog. Like Ori, Roc evaluates a canonicalized IR (not the raw AST) via tree-walking. Roc’s canonical expressions are defined in can::expr::Expr — a sugar-free enum similar to Ori’s CanExpr. Both interpreters use exhaustive matching with no catch-all arms. The key difference is that Roc’s interpreter is integrated with its compilation infrastructure, while Ori’s ori_eval is deliberately standalone.
GHC’s Core evaluator walks a tiny functional calculus (System FC) with roughly 10 expression forms. This demonstrates the power of aggressive desugaring — Haskell’s rich surface syntax maps to a handful of core constructs. Ori’s canonical IR is larger (~54 variants) because it preserves more operational detail (distinct loop forms, try expressions, format specifiers) rather than encoding everything as function application.
Zig’s comptime interpreter in Sema.zig evaluates compile-time expressions by walking ZIR (Zig Intermediate Representation). Like Ori’s ConstEval mode, it operates with budget limits and restricted capabilities. Zig’s interpreter is more tightly integrated with semantic analysis — it runs during type checking rather than as a separate phase.
Lua 5.0’s original interpreter (before the register-based VM) used a tree-walking approach similar to Ori’s. Lua’s transition to a register-based bytecode VM in 5.0 delivered roughly 2x speedup, demonstrating the performance ceiling of tree-walking for a general-purpose scripting language. Ori addresses this differently — via AOT compilation rather than a bytecode VM.
Design Tradeoffs
Exhaustive match vs catch-all. Exhaustive matching on 54+ variants creates large match expressions that are harder to read than a short match with a _ => unreachable!() catch-all. But the safety benefit is significant: adding a new CanExpr variant immediately surfaces every evaluation site that needs updating. This is the same reasoning that Rust uses for requiring exhaustive enum matches — the compile-time cost prevents runtime bugs.
Copy-out vs borrow splitting. An alternative to copying CanExpr out of the arena would be to split the Interpreter struct so that the canonical IR is accessed through a separate reference that does not conflict with the mutable self borrow. This would avoid the copy but require restructuring the interpreter’s data layout. At 24 bytes per copy, the current approach has negligible overhead and keeps the struct design simple.
Single eval_can vs visitor pattern. Some interpreters use a visitor pattern where each expression type has an accept() method that dispatches to a visitor. This is common in Java-based interpreters (following the pattern in Nystrom’s Crafting Interpreters). Ori uses a single match expression instead — more idiomatic in Rust, avoids dynamic dispatch, and keeps the evaluation logic co-located rather than scattered across accept methods.
ensure_sufficient_stack vs explicit trampoline. The stacker crate’s stack extension is a pragmatic choice. A trampoline (converting recursion to an explicit stack of continuations) would give full control over stack usage but requires significant restructuring of the evaluation logic. The stacker approach preserves the natural recursive structure while preventing stack overflows.
Related Documents
- Evaluator Overview — Architecture, method dispatch, evaluation modes
- Environment — Variable scoping and closure capture
- Value System — Runtime value representation
- Canonicalization — The phase that produces CanExpr
- Pattern Compilation — How match patterns become decision trees