Section 04: Bytecode Compilation
Status: Not Started
Goal: Compile CanExpr (canonical IR) into a flat bytecode representation that a tight dispatch loop can execute without recursive function calls. This eliminates the fundamental overhead of tree-walking: recursive dispatch, branch prediction misses, and pointer chasing through arena-allocated expression nodes.
Context: The current interpreter dispatches via eval_can(can_id) which arena-indexes into CanExpr, pattern-matches on the expression kind, and recursively calls eval_can for sub-expressions. Each dispatch is a function call through the Rust call stack — at minimum 3-5ns for the call/ret alone, plus branch prediction misses on the match (~5-10ns per miss). For deeply nested expressions, this compounds.
Bytecode compilation flattens the recursion: the compiler walks the CanExpr tree once (at compile time) and emits a linear sequence of instructions. The VM then executes these sequentially in a tight loop — no recursion, no arena indexing, no pattern matching on expression kinds.
Reference implementations:
- Lua 5.4
lopcodes.h: 82 register-based opcodes, 3 instruction formats (ABC, ABx, AsBx), 32-bit fixed-width instructions - CPython 3.12
Python/compile.c: Stack-based bytecode compiler, ~8,000 lines. Compiles AST → bytecode with constant folding and peephole optimization - Roc interpreter: Flat instruction dispatch with arena-threaded operands — similar concept, different encoding
Depends on: Section 01. Sections 02-03 can inform bytecode data layout, but the compiler should not wait on them.
04.1 Instruction Set Design
File(s): compiler/ori_eval/src/bytecode/mod.rs (new), compiler/ori_eval/src/bytecode/opcode.rs (new)
Design a register-based instruction set for Ori. Register-based (like Lua) rather than stack-based (like CPython) because:
- Fewer instructions per operation (no dup/swap/rot)
- Better maps to Ori’s expression-based semantics (every expression produces a value in a register)
- Simpler optimization (register allocation is explicit)
-
Design the instruction encoding:
/// 32-bit instruction: [opcode:8][A:8][B:8][C:8] or [opcode:8][A:8][Bx:16] #[repr(u8)] pub enum OpCode { // Load/Store LoadConst, // A = constants[Bx] LoadNil, // A = Void LoadTrue, // A = true LoadFalse, // A = false Move, // A = B (register copy) // Arithmetic (A = B op C) AddInt, SubInt, MulInt, DivInt, ModInt, PowInt, AddFloat, SubFloat, MulFloat, DivFloat, Negate, // A = -B // Comparison (A = B cmp C → bool) EqInt, NeqInt, LtInt, LeInt, GtInt, GeInt, EqFloat, NeqFloat, LtFloat, LeFloat, EqStr, EqBool, // String Concat, // A = B + C (string concat) Template, // A = template string (args in B..B+C) // Control flow Jump, // ip += sBx (signed offset) JumpIfFalse, // if !A then ip += sBx JumpIfTrue, // if A then ip += sBx // Function calls Call, // A = call B with args C..C+N Return, // return A TailCall, // tail-call B with args C..C+N // Closures Closure, // A = new closure from prototype Bx GetUpvalue, // A = upvalues[B] SetUpvalue, // upvalues[B] = A // Scope GetLocal, // A = locals[B] (for nested scope access) SetLocal, // locals[B] = A // Pattern matching TestTag, // A = (B.tag == C) → bool GetField, // A = B.field[C] GetIndex, // A = B[C] SetIndex, // A[B] = C // Collections NewList, // A = new list with B..B+C elements NewMap, // A = new map with B..B+C key-value pairs NewTuple, // A = new tuple with B..B+C elements NewStruct, // A = new struct (type in constants[Bx]) // Variants NewVariant, // A = variant (type+variant in constants, fields in B..B+C) MatchVariant, // A = (B is variant C) → bool Unwrap, // A = unwrap B (Option/Result) // Method dispatch MethodCall, // A = B.method(C..C+N) — method name in constants // Iterator ForPrepare, // A = iterator from B ForLoop, // A = next(B); if done jump sBx // Error handling Propagate, // if B is Err/None, return B (? operator) } -
Do not hard-commit to ~80 opcodes on day one: start with a smaller semantic core ISA (roughly 25-40 opcodes) that can express every
CanExpr(48 variants) andDecisionTree(4 variants) behavior. Add type-specialized fast paths only for operations that profiling proves are hot enough to justify opcode surface area, decoder complexity, and matrix-test cost. Note: the currentCanExprusesCanRange { start: u32, len: u16 }for variable-length children (args, list elements, etc.) — the bytecode compiler must decide how to encode these (inline count in instruction, or separate extended operand word). -
Design the
Chunk(compiled function) representation:pub struct Chunk { code: Vec<u32>, // Bytecode instructions constants: Vec<Value>, // Constant pool prototypes: Vec<Chunk>, // Nested function prototypes (closures) param_count: u8, // Number of parameters register_count: u8, // Max registers needed upvalue_count: u8, // Number of captured variables source_map: Vec<(u32, Span)>,// Instruction index → source span (for errors) } -
Document instruction encoding format: Fixed 32-bit width, ABC or ABx format. A=destination register, B/C=source registers or constant indices.
-
Handle
CanBindingPatterndestructuring inLet:CanExpr::LetusesCanBindingPatternIdwhich can be:Name(simple),Tuple(multi-register),Struct(field extraction),List(head/rest), orWildcard(discard). The bytecode compiler must emit different instruction sequences for each. SimpleNameis one register assignment;Tuple { (a, b) = expr }needsNewTupleunwrap into N registers;Struct { { x, y } = expr }needs field extraction;List { [h, ..t] = expr }needs head/tail split. -
Handle
DecisionTreecompilation forMatch:CanExpr::Matchcarries aDecisionTreeIdreferencing a pre-compiledDecisionTreewith 4 variants:Switch { path, test_kind, edges, default },Leaf { arm_index, bindings },Guard { arm_index, bindings, guard, on_fail }, andFail. The bytecode compiler must walk this tree structure, emittingTestTag/comparison + conditional jumps for Switch, binding loads for Leaf, guard evaluation + fallthrough for Guard. This is the most complex compilation target and should have dedicated tests. -
Validate ISA completeness: Walk every
CanExprvariant (48 total, defined incompiler/ori_ir/src/canon/expr.rs) and everyCanBindingPatternvariant (5 total, defined incompiler/ori_ir/src/canon/patterns.rs) and verify each maps to one or more opcodes. No gaps. Create an exhaustiveness test that iterates all variants and asserts a mapping exists. -
Address existing
decision_treeTODOs:compiler/ori_eval/src/exec/decision_tree/mod.rshas 2 TODOs referencing “section-07” (lines 266, 276) abouttest_kindusage and numeric discriminants. These are forward references that must be resolved during bytecode compilation — the bytecode compiler must decide how to handleTestKindand discriminant-based dispatch. Track these as dependencies for 04.3. -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (04.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.2 Bytecode Compiler
File(s): compiler/ori_eval/src/bytecode/compiler.rs (new)
Compile CanExpr trees into Chunk bytecode. This is a single-pass tree walk that emits instructions and manages register allocation.
-
Implement
BytecodeCompiler:pub struct BytecodeCompiler<'a> { chunk: Chunk, arena: &'a CanArena, /// Next free register next_reg: u8, /// Local variable → register mapping locals: Vec<(Name, u8)>, /// Scope depth for local cleanup scope_depth: u32, } impl<'a> BytecodeCompiler<'a> { pub fn compile(arena: &'a CanArena, body: CanId) -> Chunk { ... } /// Compile an expression, placing result in `dest` register fn compile_expr(&mut self, expr: CanId, dest: u8) { ... } /// Allocate a temporary register fn alloc_reg(&mut self) -> u8 { ... } /// Free a temporary register fn free_reg(&mut self, reg: u8) { ... } } -
Implement compilation for each CanExpr variant (prioritized by frequency in spec tests):
- Literals:
Int,Float,Bool,Str,Char,Duration,Size,Unit→LoadConst/LoadTrue/LoadFalse/LoadNil. AlsoConstant(ConstantPool index). - References:
Ident,Const→Movefrom local register.SelfRef→Movefrom self register.FunctionRef→ load from function table.TypeRef→ load type handle.HashLength→ load collection length. - Operators:
Binary { op, left, right }→ type-specialized opcodes perBinaryOp.Unary { op, operand }→Negate/Not/etc.Cast→ conversion opcodes. - Bindings:
Let { pattern, init, mutable }→ compile initializer, assign register(s) to name(s).CanBindingPatternIdmay involve destructuring (tuple, struct, list patterns).Assign→ update register or heap. - Blocks:
Block { stmts, result }→ compile statements sequentially, last expr is result register. - Control flow:
If→JumpIfFalse+ branch compilation.Match→ compileDecisionTree(Switch/Leaf/Guard/Fail).For→ForPrepare/ForLoop+ body + yield collection.Loop→ unconditional loop.Break/Continuewith label resolution. - Calls:
Call { func, args: CanRange }→ evaluate args into registers,Call.MethodCall→ receiver + method name from constant pool. - Collections:
List(CanRange)→NewList.Tuple(CanRange)→NewTuple.Map(CanMapEntryRange)→NewMap.Struct { name, fields: CanFieldRange }→NewStruct.Range→ range construction. - Algebraic:
Ok/Err/Some/None→ wrap/unwrap opcodes. - Error handling:
Try(?) →Propagate.Await→ await opcode (route through existing helper function, not a bespoke opcode). - Functions:
Lambda { params, body }→Closurewith prototype index + upvalue capture. - Access:
Field { receiver, field }→GetField.Index { receiver, index }→GetIndex. - Special forms:
FunctionExp→ route to existing helper functions (Recurse/Parallel/Print/Panic/etc.).FormatWith→ format helper call.WithCapability→ capability scope management.Unsafe→ transparent (compile inner expr). - Error:
Error→ unreachable (should not appear in valid programs).
- Literals:
-
Inventory the entire semantic surface, not just common expression shapes (48 CanExpr variants total):
- Literals (8):
Int,Float,Bool,Str,Char,Duration,Size,Unit - Constants (1):
Constant(index intoConstantPool) - References (6):
Ident,Const,SelfRef,FunctionRef,TypeRef,HashLength - Operators (3):
Binary(withBinaryOp— 16+ arithmetic/comparison/logic ops),Unary(withUnaryOp),Cast(infallible/fallible) - Calls (2):
Call,MethodCall - Access (2):
Field,Index - Control flow (6):
If,Match(carries compiledDecisionTree— compile that tree directly),For(with label/pattern/guard/yield),Loop,Break(with label/value),Continue(with label/value) - Bindings (3):
Block,Let(withCanBindingPatternId),Assign - Functions (1):
Lambda(withCanParamRange) - Collections (5):
List,Tuple,Map,Struct,Range - Algebraic (4):
Ok,Err,Some,None - Error handling (2):
Try(?operator),Await - Safety (1):
Unsafe - Capabilities (1):
WithCapability - Special forms (2):
FunctionExp(15 FunctionExpKind variants: Recurse/Parallel/Spawn/Timeout/Cache/With/Print/Panic/Catch/Todo/Unreachable/Channel/ChannelIn/ChannelOut/ChannelAll),FormatWith - Error recovery (1):
Error - Record which constructs must remain calls into existing helper paths during the first VM cut instead of forcing bespoke opcodes immediately (likely:
FunctionExp,WithCapability,Matchwith complexDecisionTree,FormatWith)
- Literals (8):
-
Register allocation strategy: Linear scan — each expression gets a register, freed when no longer needed. Parameters occupy registers 0..N-1. Temporaries start at N.
-
Constant folding: Evaluate constant expressions at compile time (e.g.,
2 + 3→LoadConst 5). This is the existing const eval path. -
Crate placement decision: The bytecode module lives inside
ori_eval(not a new crate) because it consumesCanExpr/CanArenafromori_irand producesValuetypes fromori_eval. It is a new execution backend for the same evaluator crate. Files:compiler/ori_eval/src/bytecode/{mod.rs, opcode.rs, compiler.rs, chunk.rs}. -
TPR checkpoint after 04.2: Run
/tpr-reviewafter the bytecode compiler handles all 48CanExprvariants. This catches ISA design issues before 04.3 (closures/multi-clause) compounds complexity. All findings must be resolved before proceeding. -
Design
Chunkso Section 07 can cache it without reworking the IR: give itClone + Debugand a stable, explicit layout. Do not block Section 04 on whether chunks live in a Salsa query or a side cache; that cache-boundary decision belongs to Section 07 once the runtime integration seam is implemented. -
Matrix tests (in
compiler/ori_eval/src/bytecode/compiler/tests.rs):- Dimensions: CanExpr category (literals, references, operators, calls, control flow, bindings, functions, collections, algebraic, error handling, special forms) x encoding format (ABC vs ABx) x register pressure (1 reg, 5 regs, max regs)
- Semantic pin: compile
let $x = 2 + 3and verify it emitsLoadConst(r0, 2), LoadConst(r1, 3), AddInt(r2, r0, r1)(or equivalent after constant folding:LoadConst(r0, 5)) - Negative pin: compile
CanExpr::Errorand verify it produces an unreachable/error opcode, not a valid instruction sequence - TDD ordering: design ISA types FIRST (opcode enum, Chunk struct), write compilation tests for literals and simple expressions, verify they fail (compiler doesn’t exist), then implement compiler incrementally by CanExpr category
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (04.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.3 Constant Pool and Closures
File(s): compiler/ori_eval/src/bytecode/compiler.rs, compiler/ori_eval/src/bytecode/closure.rs (new)
Handle the tricky parts: constant deduplication, closure compilation with upvalue capture, and multi-clause function dispatch.
-
Constant pool deduplication: deduplicate canonical constants (
ConstValue/ConstantId) rather than hashing arbitrary runtimeValues. The compiler should not depend on full runtimeValueequality for semantic constants. -
Closure compilation: When compiling a lambda/closure:
- Compile the body as a nested
Chunk(prototype) - Identify free variables → these become upvalues
- Emit
Closureinstruction that creates the closure object with upvalue references - Upvalue access via
GetUpvalue/SetUpvaluein the inner function
- Compile the body as a nested
-
Closure metadata must preserve current evaluator behavior: default parameters, captured capabilities, memoized-function wrapping, and any source/backtrace data needed by diagnostics cannot be implicit “future work”.
-
Multi-clause functions and
matchlower through the existing decision-tree contract:- Compile
DecisionTree::Switch { path: ScrutineePath, test_kind: TestKind, edges, default }— emit scrutinee path navigation (field access, index, variant tag extraction viaScrutineePath), then a test chain (compare scrutinee against each edge’sTestValue), then conditional jumps to subtree code - Compile
DecisionTree::Leaf { arm_index, bindings: Vec<(Name, ScrutineePath)> }— emit path-based binding loads (navigate scrutinee viaScrutineePathto extract each bound variable into a register), then jump to the arm body - Compile
DecisionTree::Guard { arm_index, bindings, guard: CanId, on_fail }— emit bindings, evaluate guard expression,JumpIfFalsetoon_failsubtree code. Preserve guard fallback semantics exactly (fall through to remaining compatible arms, not just next sequential arm) - Compile
DecisionTree::Fail— emitunreachable(exhaustiveness guarantees this never executes) - Only after the above is working, consider specialized opcodes for hot literal tests
- Compile
-
Tail call detection: When a
Callis the last instruction beforeReturn, emitTailCallinstead — this enables the VM to reuse the current frame. -
Matrix tests (in
compiler/ori_eval/src/bytecode/compiler/tests.rs):- Dimensions: closure capture count (0, 1, 5) x nesting (flat, nested) x DecisionTree variant (Switch with literals, Leaf with bindings, Guard with fallthrough, Fail) x tail position (tail call, non-tail call)
- Semantic pin: multi-clause Ackermann compiles to a DecisionTree-driven branch chain that matches the tree structure from
compiler/ori_ir/src/canon/tree/mod.rs - Negative pin: non-tail call (inner
ack(m:, n: n - 1)in the third clause) does NOT emit TailCall opcode - TDD ordering: write closure compilation tests and DecisionTree compilation tests FIRST, verify they fail, then implement
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (04.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.R Third Party Review Findings
- None.
04.N Completion Checklist
- All
CanExprvariants have corresponding bytecode compilation -
BytecodeCompiler::compile()produces validChunkfor all spec test programs - Constant pool deduplication works (same value → same index)
- Closure compilation produces correct upvalue references
- Multi-clause dispatch compiles to branch chain
- Tail calls detected and emitted as
TailCall - Source maps preserved (every instruction traces to a source span)
-
./test-all.shgreen (tree-walker still used for execution — bytecode compilation is verified but not yet executed) - Plan annotation cleanup:
bash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan 04returns 0 annotations - All intermediate TPR checkpoint findings resolved
-
/tpr-reviewpassed -
/impl-hygiene-reviewpassed -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: BytecodeCompiler::compile() successfully compiles all 5,800+ spec test programs to bytecode without errors. The bytecode is not yet executed (that’s Section 05) — this section proves the compilation is complete and correct by round-trip verification: compile to bytecode, disassemble, verify instruction sequence matches expected output for representative programs (Ackermann, FizzBuzz, closures, pattern matching).