Data Flow
This document describes how data flows through the Ori compiler, from source text to execution result.
Overview
flowchart TB
A["Source Text"] --> B["Lexer"]
B --> C["TokenList"]
C --> D["Parser"]
D --> E["Module + ExprArena"]
E --> F["Type Checker"]
F --> G["TypedModule (expr_types)"]
G --> H["Evaluator"]
H --> I["ModuleEvalResult (Value)"]
G -.->|"(pending)"| J["LLVM Codegen"]
J -.-> K["Native Binary"]
style J stroke-dasharray: 5 5
style K stroke-dasharray: 5 5
Stage 1: Lexing
Input: Source text (String) Output: TokenList
pub struct TokenList {
tokens: Vec<Token>,
spans: Vec<Span>,
}
pub struct Token {
kind: TokenKind,
// Identifiers store interned Name
}
Data transformations:
- Character stream -> Token stream
- Identifiers -> Interned Names (via Interner)
- Literals -> Typed values (Int, Float, String)
- Comments -> Discarded (not in token stream)
Example:
"let x = 42"
-> TokenList {
tokens: [
Token { kind: Let },
Token { kind: Ident(Name(0)) }, // "x" interned
Token { kind: Eq },
Token { kind: Int(42) },
],
spans: [Span(0,3), Span(4,5), Span(6,7), Span(8,10)]
}
Stage 2: Parsing
Input: TokenList Output: ParseResult { module: Module, arena: ExprArena, errors }
pub struct Module {
functions: Vec<Function>,
types: Vec<TypeDef>,
tests: Vec<Test>,
imports: Vec<Import>,
}
pub struct ExprArena {
exprs: Vec<Expr>, // Indexed by ExprId
}
pub struct Expr {
kind: ExprKind,
span: Span,
}
Data transformations:
- Token stream -> AST nodes
- Expressions -> Arena-allocated (get ExprId)
- Syntax errors -> ParseError list
Example:
TokenList [Let, Ident(x), Eq, Int(42)]
-> Module {
functions: [],
types: [],
...
}
-> ExprArena {
exprs: [
Expr { kind: Let { name: x, value: ExprId(1) } },
Expr { kind: Literal(Int(42)) },
]
}
Stage 3: Type Checking
Input: Module + ExprArena Output: TypedModule
pub struct TypedModule {
expr_types: Vec<Type>, // expr_types[expr_id] = type of that expr
errors: Vec<TypeError>,
}
pub enum Type {
Int,
Float,
Bool,
String,
List(Box<Type>),
Option(Box<Type>),
Function { params: Vec<Type>, ret: Box<Type> },
TypeVar(TypeVarId),
// ...
}
Data transformations:
- Expressions -> Type annotations (stored parallel to ExprArena)
- Type variables -> Resolved types (via unification)
- Type mismatches -> TypeError list
Example:
ExprArena[ExprId(0)] = Let { name: x, value: ExprId(1) }
ExprArena[ExprId(1)] = Literal(Int(42))
-> TypedModule {
expr_types: [
Type::Int, // ExprId(0) has type Int (from binding)
Type::Int, // ExprId(1) has type Int (literal)
],
errors: [],
}
Stage 4: Evaluation
Input: Module + ExprArena + TypedModule Output: ModuleEvalResult
pub struct ModuleEvalResult {
value: Value,
output: EvalOutput,
}
pub enum Value {
Int(i64),
Float(f64),
Bool(bool),
String(Arc<String>),
List(Arc<Vec<Value>>),
Function(FunctionValue),
// ...
}
pub struct EvalOutput {
stdout: String,
stderr: String,
}
Data transformations:
- AST + Types -> Runtime Values
- Function calls -> Stack frame creation
- Pattern expressions -> Pattern evaluation
- Print calls -> Output capture
Example:
Module with @main() -> int = 42
-> ModuleEvalResult {
value: Value::Int(42),
output: EvalOutput { stdout: "", stderr: "" },
}
Cross-Cutting Data
Spans
Spans track source locations throughout:
pub struct Span {
start: u32,
end: u32,
}
Used in:
- Tokens (where in source)
- Expressions (where in source)
- Errors (where the problem is)
- Code fixes (where to apply fix)
Interned Names
All identifiers use interned names:
pub struct Name(u32);
// The Interner maps Name <-> String
interner.intern("foo") // -> Name(0)
interner.resolve(Name(0)) // -> "foo"
Used in:
- Token identifiers
- AST variable names
- Type names
- Pattern names
Error Types
Each stage has its own error type:
// Lexer (rare)
pub struct LexError { span: Span, message: String }
// Parser
pub struct ParseError { span: Span, expected: Vec<TokenKind>, found: TokenKind }
// Type checker
pub struct TypeError { span: Span, kind: TypeErrorKind }
// Evaluator
pub struct EvalError { span: Span, kind: EvalErrorKind }
All errors carry spans for accurate reporting.
Memory Flow
flowchart TB
A["Source Text (owned String)"] --> B["Interned (shared via Interner)"]
A --> C["TokenList (owns Vec<Token>)"]
C -->|"tokens consumed by parser"| D["Module + ExprArena (owns AST nodes)"]
D -->|"borrowed by type checker"| E["TypedModule (owns Vec<Type>)"]
E -->|"all borrowed by evaluator"| F["Value (owns runtime data via Arc)"]
Key points:
- No data is copied between stages
- AST is borrowed, not cloned, after parsing
- Values use Arc for safe sharing in closures
- Salsa handles caching and lifetime management