Architecture Overview

The Ori compiler (oric) is an incremental compiler built on Salsa, a framework for on-demand, incremental computation. The architecture prioritizes:

  1. Incrementality - Only recompute what changes
  2. Memory efficiency - Arena allocation, string interning
  3. Extensibility - Registry-based patterns and diagnostics
  4. Testability - Dependency injection via SharedRegistry

High-Level Structure

The compiler is organized as a Cargo workspace with multiple crates:

compiler/
├── ori_ir/                   # Core IR types (tokens, spans, AST, interning)
│   └── src/
│       ├── lib.rs            # Module organization, static_assert_size! macro
│       ├── ast/              # Expression and statement types
│       ├── token.rs          # Token definitions
│       ├── span.rs           # Source location tracking
│       ├── arena.rs          # Expression arena allocation
│       ├── interner.rs       # String interning
│       └── visitor.rs        # AST visitor pattern
├── ori_diagnostic/           # Error reporting
│   └── src/
│       ├── lib.rs            # Module organization and re-exports
│       ├── error_code.rs     # ErrorCode enum, as_str(), Display
│       ├── diagnostic.rs     # Diagnostic, Label, Severity, Applicability, Suggestion
│       ├── guarantee.rs      # ErrorGuaranteed type-level proof
│       ├── queue.rs          # DiagnosticQueue (deduplication, limits)
│       ├── span_utils.rs     # Line/column computation from spans
│       ├── errors/           # Embedded error documentation
│       ├── emitter/          # Output formatting (terminal, JSON, SARIF)
│       │   ├── mod.rs        # Emitter trait, trailing_comma() helper
│       │   ├── terminal.rs   # Terminal output (colored)
│       │   ├── json.rs       # JSON output
│       │   └── sarif.rs      # SARIF format (BTreeSet for rule dedup)
│       └── fixes/            # Code suggestions and fixes
├── ori_lexer/                # Tokenization (logos-based)
│   └── src/lib.rs            # lex() function, token processing
├── ori_types/                # Type system definitions
│   └── src/
│       ├── lib.rs            # Module exports
│       ├── core.rs           # Type enum (external API)
│       ├── data.rs           # TypeData enum (internal representation)
│       ├── type_interner.rs  # TypeInterner, SharedTypeInterner
│       ├── context.rs        # InferenceContext (TypeId-based unification)
│       ├── env.rs            # TypeEnv for scoping
│       ├── traverse.rs       # TypeFolder, TypeVisitor
│       └── error.rs          # TypeError
├── ori_parse/                # Recursive descent parser
│   └── src/
│       ├── lib.rs            # Parser struct, parse() entry point
│       ├── error.rs          # Parse error types
│       ├── stack.rs          # Stack safety (stacker integration)
│       └── grammar/          # Grammar modules (expr, item, type, etc.)
├── ori_patterns/             # Pattern system, Value types
│   └── src/
│       ├── lib.rs            # PatternDefinition, TypeCheckContext
│       ├── registry.rs       # PatternRegistry, SharedPattern
│       ├── value/            # Value types, Heap, FunctionValue
│       ├── errors.rs         # EvalError, EvalResult
│       └── *.rs              # Pattern implementations
├── ori_eval/                 # Core interpreter (tree-walking evaluator)
│   └── src/
│       ├── lib.rs            # Module exports, re-exports from ori_patterns
│       ├── environment.rs    # Environment, Scope, LocalScope
│       ├── errors.rs         # EvalError factories
│       ├── operators.rs      # Binary operator dispatch
│       ├── unary_operators.rs # Unary operator dispatch
│       ├── methods.rs        # Built-in method dispatch, EVAL_BUILTIN_METHODS constant
│       ├── function_val.rs   # Type conversion functions (int, float, str, byte)
│       ├── user_methods.rs   # UserMethodRegistry
│       ├── print_handler.rs  # Print output capture
│       ├── shared.rs         # SharedRegistry, SharedMutableRegistry
│       ├── stack.rs          # Stack safety (stacker)
│       ├── exec/             # Expression execution
│       │   ├── expr.rs       # Expression evaluation
│       │   ├── call.rs       # Function call evaluation
│       │   ├── control.rs    # Control flow (if, for, loop)
│       │   └── pattern.rs    # Pattern matching
│       └── interpreter/      # Core interpreter
│           ├── mod.rs        # Interpreter struct
│           ├── builder.rs    # InterpreterBuilder
│           ├── scope_guard.rs # RAII scope management
│           ├── function_call.rs # User function calls
│           ├── function_seq.rs  # run/try/match evaluation
│           ├── method_dispatch.rs # Method resolution
│           ├── derived_methods.rs # Derived trait methods
│           └── resolvers/    # Method resolution chain
│               ├── mod.rs    # MethodDispatcher, MethodResolver trait
│               ├── user_registry.rs  # User methods
│               ├── collection.rs     # List/range methods
│               └── builtin.rs        # Built-in methods
├── ori-macros/               # Proc-macro crate
│   └── src/
│       ├── lib.rs            # Diagnostic/Subdiagnostic derives
│       ├── diagnostic.rs     # #[derive(Diagnostic)] impl
│       └── subdiagnostic.rs  # #[derive(Subdiagnostic)] impl
└── oric/                     # CLI orchestrator + Salsa queries
    └── src/
        ├── lib.rs            # Module organization
        ├── main.rs           # CLI dispatcher (thin: delegates to commands/)
        ├── commands/         # Command handlers (extracted from main.rs)
        │   ├── mod.rs        # Re-exports all command functions
        │   ├── run.rs        # run_file()
        │   ├── test.rs       # run_tests()
        │   ├── check.rs      # check_file()
        │   ├── compile.rs    # compile_file()
        │   ├── explain.rs    # explain_error(), parse_error_code()
        │   └── debug.rs      # parse_file(), lex_file()
        ├── db.rs             # Salsa database definition
        ├── query/            # Salsa query definitions
        ├── typeck/           # Type checking and inference
        ├── eval/             # High-level evaluator (wraps ori_eval)
        │   ├── mod.rs        # Re-exports, value module
        │   ├── output.rs     # EvalOutput, ModuleEvalResult
        │   ├── evaluator/    # Evaluator wrapper
        │   │   ├── mod.rs    # Evaluator struct
        │   │   ├── builder.rs # EvaluatorBuilder
        │   │   └── module_loading.rs # Module loading, prelude
        │   └── module/       # Import resolution
        │       └── import.rs # Module import handling
        ├── test/             # Test runner
        └── debug.rs          # Debug flags

Crate Dependencies

ori_ir (base)
    ├── ori_diagnostic
    ├── ori_lexer
    ├── ori_types
    ├── ori_parse
    └── ori_patterns ──→ ori_types

            └── ori_eval ──→ ori_patterns

                    └── oric ──→ ALL (orchestrator)

Layered architecture:

  • ori_ir: Core IR types (no dependencies)
  • ori_patterns: Pattern definitions, Value types, EvalError (single source of truth)
  • ori_eval: Core tree-walking interpreter (Interpreter, Environment, exec, method dispatch)
  • oric: CLI orchestrator with Salsa queries, type checker, high-level Evaluator wrapper

Pure functions live in library crates; Salsa queries live in oric.

Design Principles

Salsa-First Architecture

Every major computation is a Salsa query. This provides:

  • Automatic caching - Query results are memoized
  • Dependency tracking - Salsa knows what depends on what
  • Early cutoff - If output unchanged, dependents skip recomputation
#[salsa::tracked]
pub fn tokens(db: &dyn Db, file: SourceFile) -> TokenList { ... }

#[salsa::tracked]
pub fn parsed(db: &dyn Db, file: SourceFile) -> ParseResult { ... }

#[salsa::tracked]
pub fn typed(db: &dyn Db, file: SourceFile) -> TypedModule { ... }

Flat Data Structures

The AST uses arena allocation instead of Box<T>:

// Instead of this:
struct Expr {
    kind: ExprKind,
    children: Vec<Box<Expr>>,
}

// We use this:
struct Expr {
    kind: ExprKind,
    span: Span,
}

struct ExprArena {
    exprs: Vec<Expr>,  // Indexed by ExprId(u32)
}

Benefits:

  • Better cache locality
  • Simpler memory management
  • Efficient serialization for Salsa

String Interning

All identifiers are interned:

// Name is just a u32 index
let name1: Name = interner.intern("foo");
let name2: Name = interner.intern("foo");
assert_eq!(name1, name2);  // O(1) comparison

Registry Pattern

Patterns and diagnostics use registries for extensibility:

pub struct PatternRegistry {
    patterns: HashMap<Name, Box<dyn PatternDefinition>>,
}

impl PatternRegistry {
    pub fn register(&mut self, name: &str, pattern: impl PatternDefinition) { ... }
    pub fn get(&self, name: Name) -> Option<&dyn PatternDefinition> { ... }
}

Key Types

TypeCratePurpose
SourceFileoricSalsa input - source text
TokenListori_irLexer output
Tokenori_irIndividual token with kind and span
Spanori_irSource location (start/end offsets)
Moduleori_irParsed module structure
ExprArenaori_irExpression storage
ExprIdori_irIndex into ExprArena
Nameori_irInterned string identifier
TypeIdori_irInterned type identifier (sharded: 4-bit shard + 28-bit local)
Typeori_typesExternal type representation (uses Box)
TypeDataori_typesInternal type representation (uses TypeId)
TypeInternerori_typesSharded type interning for O(1) equality
Valueori_patternsRuntime values (re-exported via ori_eval)
Interpreterori_evalCore tree-walking interpreter
Environmentori_evalVariable scoping (scope stack)
EvaluatororicHigh-level evaluator (module loading, prelude)
Diagnosticori_diagnosticRich error with suggestions
ErrorGuaranteedori_diagnosticProof that an error was emitted
Applicabilityori_diagnosticFix confidence level
ParseResultori_parseParser output (module + arena + errors)

Crate Organization

CratePurpose
ori_irCore IR types: tokens, spans, AST, arena, string interning, TypeId
ori_diagnosticError reporting (split: error_code, diagnostic, guarantee), DiagnosticQueue, emitters, error docs
ori_lexerTokenization via logos
ori_typesType system: Type/TypeData, TypeInterner, InferenceContext, TypeIdFolder
ori_parseRecursive descent parser
ori_patternsPattern definitions, Value types, EvalError (single source of truth)
ori_evalCore tree-walking interpreter: Interpreter, Environment, exec, method dispatch
ori-macrosProc-macros (#[derive(Diagnostic)], etc.)
oricCLI orchestrator, Salsa queries, typeck, high-level Evaluator, patterns

DRY Re-exports

To avoid code duplication, oric re-exports from source crates rather than maintaining duplicate definitions:

oric ModuleRe-exports From
oric::irori_ir
oric::parserori_parse
oric::diagnosticori_diagnostic
oric::typesori_types

This pattern ensures:

  • Single source of truth for each type
  • Consistent behavior across the codebase
  • Easier maintenance and refactoring

File Size Guidelines

To maintain code quality, files follow size limits:

  • Target: ~500 lines per file
  • Maximum: 800 lines per file
  • Exception: Grammar files may be larger due to many variants

When files exceed limits, extract submodules:

  • evaluator.rs -> eval/exec/expr.rs, eval/exec/call.rs, etc.
  • types.rs -> typeck/infer/expr.rs, typeck/infer/call.rs, etc.