Implementation Overview
This document describes the implementation of the Ori formatter in compiler/ori_fmt/.
Reference: Go’s gofmt
The Ori formatter follows gofmt’s philosophy: minimal-config (width only), deterministic, implementation is the spec.
Key techniques from gofmt that inform Ori’s implementation:
| Technique | gofmt | Ori adaptation |
|---|---|---|
| Two-phase pipeline | AST printer → tabwriter | Width calculator → formatter |
| Idempotence | Core guarantee | Core guarantee |
| Configuration | Deliberately denied | Width only (--width=N) |
Architecture
The formatter operates on the parsed AST and produces formatted source text.
Source Text → Lexer → Parser → AST → Formatter → Formatted Text
Crate Structure
The formatter uses a 5-layer architecture. See 5-Layer Architecture for design details.
compiler/ori_fmt/
├── src/
│ ├── lib.rs # Public API, tabs_to_spaces()
│ │
│ │ # === 5-Layer Architecture ===
│ ├── spacing/ # Layer 1: Token spacing (O(1) lookup)
│ │ ├── action.rs # SpaceAction enum
│ │ ├── category.rs # TokenCategory grouping
│ │ ├── matcher.rs # TokenMatcher patterns
│ │ ├── rules.rs # Declarative spacing rules
│ │ ├── lookup.rs # RulesMap O(1) lookup
│ │ └── tests.rs
│ ├── packing/ # Layer 2: Container packing decisions
│ │ ├── strategy.rs # Packing enum, determine_packing()
│ │ ├── construct.rs # ConstructKind (22 container types)
│ │ ├── separator.rs # Separator enum
│ │ ├── simple.rs # Simple item detection
│ │ └── tests.rs
│ ├── shape/ # Layer 3: Width tracking
│ │ ├── core.rs # Shape struct
│ │ └── tests.rs
│ ├── rules/ # Layer 4: Breaking rules (8 rules)
│ │ ├── method_chain.rs
│ │ ├── short_body.rs
│ │ ├── boolean_break.rs
│ │ ├── chained_else_if.rs
│ │ ├── nested_for.rs
│ │ ├── parentheses.rs
│ │ ├── loop_rule.rs
│ │ └── tests.rs
│ ├── formatter/ # Layer 5: Orchestration
│ │ ├── mod.rs # Main Formatter struct
│ │ ├── inline.rs # Single-line rendering
│ │ ├── broken.rs # Multi-line rendering
│ │ ├── stacked.rs # Always-stacked constructs
│ │ ├── helpers.rs # Collection helpers
│ │ ├── patterns.rs # Pattern rendering
│ │ ├── literals.rs # Literal rendering
│ │ └── tests.rs
│ │
│ │ # === Support Modules ===
│ ├── width/ # Width calculation
│ │ ├── mod.rs # WidthCalculator, ALWAYS_STACKED
│ │ ├── calls.rs # Call/method call width
│ │ ├── collections.rs # List/map/tuple/struct width
│ │ ├── compounds.rs # Duration/size width
│ │ ├── control.rs # Control flow width
│ │ ├── helpers.rs # Shared utilities
│ │ ├── literals.rs # Literal width
│ │ ├── operators.rs # Binary/unary operator width
│ │ ├── patterns.rs # Binding pattern width
│ │ └── wrappers.rs # Ok/Err/Some/etc. width
│ ├── declarations/ # Module-level formatting
│ ├── comments/ # Comment preservation
│ ├── context/ # FormatContext, column/indent tracking
│ ├── emitter/ # Emitter trait, StringEmitter
│ └── incremental/ # Incremental formatting for LSP
└── tests/
├── golden_tests.rs # Golden file tests
├── idempotence_tests.rs # Idempotence verification
├── incremental_tests.rs # Incremental formatting tests
├── property_tests.rs # Property-based tests
└── width_tests.rs # Width calculation tests
Key Components
| Component | Location | Purpose |
|---|---|---|
SpaceAction | spacing/action.rs | Token spacing decisions (Layer 1) |
TokenCategory | spacing/category.rs | Abstract token types for matching (Layer 1) |
Packing | packing/strategy.rs | Container packing strategies (Layer 2) |
ConstructKind | packing/construct.rs | Container type enumeration (Layer 2) |
Shape | shape/core.rs | Width tracking through recursion (Layer 3) |
*Rule | rules/*.rs | 8 Ori-specific breaking rules (Layer 4) |
Formatter | formatter/mod.rs | Top-down rendering, orchestration (Layer 5) |
WidthCalculator | width/mod.rs | Bottom-up width calculation with caching |
FormatContext | context/ | Column tracking, indentation, line width checking |
ModuleFormatter | declarations/ | Module-level formatting (functions, types, etc.) |
CommentIndex | comments/ | Comment association and doc comment reordering |
Emitter | emitter/ | Output abstraction (StringEmitter implemented) |
Width Calculation
The WidthCalculator Struct
The WidthCalculator performs bottom-up traversal to compute inline width of each expression. Results are cached in an FxHashMap for efficiency.
pub struct WidthCalculator<'a, I: StringLookup> {
arena: &'a ExprArena,
interner: &'a I,
cache: FxHashMap<ExprId, usize>,
}
The ALWAYS_STACKED Sentinel
Some constructs bypass width-based decisions and always use stacked format:
pub const ALWAYS_STACKED: usize = usize::MAX;
When width() returns ALWAYS_STACKED, the formatter skips inline rendering and goes directly to stacked format.
Always-stacked constructs:
{ }blocks,try(sequential blocks)match(arms always stack)recurse,parallel,spawn,catchnursery
Width Formulas
| Construct | Width Formula |
|---|---|
| Identifier | name.len() |
| Integer literal | digit count |
| Float literal | formatted string length |
| String literal | content.len() + 2 (quotes) |
| Binary expr | left + op_width + right (3 for spaced ops like +) |
| Function call | func + 1 + args_width + 1 |
| Named argument | name + 2 + value (: ) |
| Struct literal | name + 3 + fields_width + 2 ({ + }) |
| List | 2 + items_width ([ + ]) |
| Map | 2 + entries_width ({ + }) |
| Tuple | 2 + items_width + trailing_comma |
Width Calculation Example
impl<'a, I: StringLookup> WidthCalculator<'a, I> {
pub fn width(&mut self, expr_id: ExprId) -> usize {
if let Some(&cached) = self.cache.get(&expr_id) {
return cached;
}
let width = self.calculate_width(expr_id);
self.cache.insert(expr_id, width);
width
}
fn calculate_width(&mut self, expr_id: ExprId) -> usize {
let expr = self.arena.get_expr(expr_id);
match &expr.kind {
ExprKind::Int(n) => int_width(*n),
ExprKind::Binary { op, left, right } => {
let left_w = self.width(*left);
let right_w = self.width(*right);
if left_w == ALWAYS_STACKED || right_w == ALWAYS_STACKED {
return ALWAYS_STACKED;
}
left_w + binary_op_width(*op) + right_w
}
ExprKind::Block { .. } => ALWAYS_STACKED,
// ... other cases
}
}
}
Formatting Algorithm
The Formatter Struct
The Formatter wraps a width calculator and format context to produce formatted output:
pub struct Formatter<'a, I: StringLookup> {
arena: &'a ExprArena,
interner: &'a I,
width_calc: WidthCalculator<'a, I>,
ctx: FormatContext<StringEmitter>,
}
Three Rendering Modes
The formatter uses three rendering modes:
- Inline (
emit_inline): Single-line, all content on current line - Broken (
emit_broken): Multi-line, content breaks according to construct rules - Stacked (
emit_stacked): Always multi-line, for blocks/try/match/etc.
pub fn format(&mut self, expr_id: ExprId) {
let width = self.width_calc.width(expr_id);
if width == ALWAYS_STACKED {
self.emit_stacked(expr_id);
} else if self.ctx.fits(width) {
self.emit_inline(expr_id);
} else {
self.emit_broken(expr_id);
}
}
Format Context
The FormatContext tracks state during formatting:
pub struct FormatContext<E: Emitter> {
emitter: E,
column: usize, // Current column (0-indexed)
indent_level: usize, // Nesting depth
config: FormatConfig, // Max width, etc.
}
Key methods:
fits(width): Returnstrueifcolumn + width <= max_widthemit(text): Emit text and update columnemit_newline(): Emit newline and reset column to 0emit_indent(): Emit indentation spacesindent()/dedent(): Adjust indentation level
Breaking Behavior
Binary expressions break before the operator:
fn emit_broken(&mut self, expr_id: ExprId) {
// ...
ExprKind::Binary { op, left, right } => {
self.format(*left);
self.ctx.emit_newline_indent();
self.ctx.emit(binary_op_str(*op));
self.ctx.emit_space();
self.format(*right);
}
// ...
}
Collections have two breaking modes:
- Simple items (literals, identifiers): wrap multiple per line
- Complex items (structs, calls, nested): one per line
fn emit_broken_list(&mut self, items: &[ExprId]) {
let all_simple = items.iter().all(|id| self.is_simple_item(*id));
if all_simple {
self.emit_broken_list_wrap(items); // Multiple per line
} else {
self.emit_broken_list_one_per_line(items);
}
}
Comment Handling
Comment Association
Comments are associated with AST nodes by source position. A comment “belongs to” the node that immediately follows it.
pub struct CommentIndex {
comments_by_position: BTreeMap<u32, Vec<CommentRef>>,
consumed: Vec<bool>,
}
Doc Comment Reordering
Doc comments are reordered to canonical order:
// #Description(may span multiple lines)// @param name(in signature order)// @field name(in struct order)// !Warningor// !Error// >example -> result
Regular comments (//) preserve their original order.
Function Parameter Reordering
For functions, @param comments are reordered to match parameter order:
pub fn take_comments_before_function<I: StringLookup>(
&mut self,
pos: u32,
param_names: &[&str], // From function signature
comments: &CommentList,
interner: &I,
) -> Vec<usize>
Declaration Formatting
Module Structure
ModuleFormatter handles top-level declarations in order:
- Imports (stdlib first, then relative)
- Constants/configs
- Type definitions
- Traits
- Impls
- Functions
- Tests
Blank lines separate different declaration kinds.
Function Signatures
Function formatting considers trailing width (return type, capabilities, where clauses) when deciding whether to break parameters:
fn calculate_function_trailing_width(&mut self, func: &Function) -> usize {
let mut width = 0;
// Return type: " -> Type"
if let Some(ref ret_ty) = func.return_ty {
width += 4 + self.calculate_type_width(ret_ty);
}
// Capabilities, where clauses, " = "
// ...
width
}
Function Bodies
Function bodies break differently based on construct type:
- Conditionals break to new line with indent
- Always-stacked constructs stay on same line, break internally
- Other constructs break internally as needed
Incremental Formatting
Purpose
Format only declarations that overlap with a changed region, rather than reformatting the entire file. Useful for LSP format-on-type.
API
pub fn format_incremental<I: StringLookup>(
module: &Module,
comments: &CommentList,
arena: &ExprArena,
interner: &I,
change_start: usize,
change_end: usize,
) -> IncrementalResult
Results
Regions(Vec<FormattedRegion>): Specific regions to replaceFullFormatNeeded: Change affects imports/configs, need full formatNoChangeNeeded: Change is between declarations
Limitations
- Minimum unit is a complete top-level declaration
- Import and config changes require full format (block-formatted)
- Multi-declaration changes format all affected declarations
Testing
Test Types
| Test Type | Location | Purpose |
|---|---|---|
| Golden tests | tests/golden_tests.rs | Compare against expected output |
| Idempotence | tests/idempotence_tests.rs | Verify format(format(x)) == format(x) |
| Property tests | tests/property_tests.rs | Random input validation |
| Width tests | tests/width_tests.rs | Width calculation accuracy |
| Incremental | tests/incremental_tests.rs | Incremental formatting |
Idempotence Testing
fn test_idempotence(input: &str) {
let first = format(input);
let second = format(&first);
assert_eq!(first, second);
}
Property-Based Testing
Verifies that formatting preserves semantics by comparing ASTs:
fn test_semantic_preservation(input: &str) {
let original_ast = parse(input);
let formatted = format(input);
let formatted_ast = parse(&formatted);
assert_eq!(strip_spans(&original_ast), strip_spans(&formatted_ast));
}
Tooling Integration
LSP Server
The LSP server (ori_lsp) provides formatting via textDocument/formatting:
// In ori_lsp
pub fn format_document(source: &str) -> String {
let (module, comments, arena, interner) = parse(source);
format_module_with_comments(&module, &comments, &arena, &interner)
}
Playground
The Ori Playground uses format-on-run: code is automatically formatted when the user clicks Run. The LSP compiles to WASM and runs in-browser.
CLI
ori fmt src/ # Format all .ori files in directory
ori fmt file.ori # Format single file
ori check --fmt # Check formatting without modifying
Performance
Caching
Width calculations are cached per expression ID:
cache: FxHashMap<ExprId, usize>
Pre-allocation
The formatter pre-allocates based on estimated output size:
pub fn with_capacity(capacity: usize) -> Self {
Self::with_emitter(StringEmitter::with_capacity(capacity))
}
Output Abstraction
The Emitter trait enables different output targets:
pub trait Emitter {
fn emit(&mut self, text: &str);
fn emit_space(&mut self);
fn emit_newline(&mut self);
fn emit_indent(&mut self, level: usize);
}
Currently only StringEmitter is implemented. The trait exists to enable future file-streaming output without building full strings in memory.