Implementation Overview

This document describes the implementation of the Ori formatter in compiler/ori_fmt/.

Reference: Go’s gofmt

The Ori formatter follows gofmt’s philosophy: minimal-config (width only), deterministic, implementation is the spec.

Key techniques from gofmt that inform Ori’s implementation:

TechniquegofmtOri adaptation
Two-phase pipelineAST printer → tabwriterWidth calculator → formatter
IdempotenceCore guaranteeCore guarantee
ConfigurationDeliberately deniedWidth only (--width=N)

Architecture

The formatter operates on the parsed AST and produces formatted source text.

Source Text → Lexer → Parser → AST → Formatter → Formatted Text

Crate Structure

The formatter uses a 5-layer architecture. See 5-Layer Architecture for design details.

compiler/ori_fmt/
├── src/
│   ├── lib.rs              # Public API, tabs_to_spaces()
│   │
│   │   # === 5-Layer Architecture ===
│   ├── spacing/            # Layer 1: Token spacing (O(1) lookup)
│   │   ├── action.rs       # SpaceAction enum
│   │   ├── category.rs     # TokenCategory grouping
│   │   ├── matcher.rs      # TokenMatcher patterns
│   │   ├── rules.rs        # Declarative spacing rules
│   │   ├── lookup.rs       # RulesMap O(1) lookup
│   │   └── tests.rs
│   ├── packing/            # Layer 2: Container packing decisions
│   │   ├── strategy.rs     # Packing enum, determine_packing()
│   │   ├── construct.rs    # ConstructKind (22 container types)
│   │   ├── separator.rs    # Separator enum
│   │   ├── simple.rs       # Simple item detection
│   │   └── tests.rs
│   ├── shape/              # Layer 3: Width tracking
│   │   ├── core.rs         # Shape struct
│   │   └── tests.rs
│   ├── rules/              # Layer 4: Breaking rules (8 rules)
│   │   ├── method_chain.rs
│   │   ├── short_body.rs
│   │   ├── boolean_break.rs
│   │   ├── chained_else_if.rs
│   │   ├── nested_for.rs
│   │   ├── parentheses.rs
│   │   ├── loop_rule.rs
│   │   └── tests.rs
│   ├── formatter/          # Layer 5: Orchestration
│   │   ├── mod.rs          # Main Formatter struct
│   │   ├── inline.rs       # Single-line rendering
│   │   ├── broken.rs       # Multi-line rendering
│   │   ├── stacked.rs      # Always-stacked constructs
│   │   ├── helpers.rs      # Collection helpers
│   │   ├── patterns.rs     # Pattern rendering
│   │   ├── literals.rs     # Literal rendering
│   │   └── tests.rs
│   │
│   │   # === Support Modules ===
│   ├── width/              # Width calculation
│   │   ├── mod.rs          # WidthCalculator, ALWAYS_STACKED
│   │   ├── calls.rs        # Call/method call width
│   │   ├── collections.rs  # List/map/tuple/struct width
│   │   ├── compounds.rs    # Duration/size width
│   │   ├── control.rs      # Control flow width
│   │   ├── helpers.rs      # Shared utilities
│   │   ├── literals.rs     # Literal width
│   │   ├── operators.rs    # Binary/unary operator width
│   │   ├── patterns.rs     # Binding pattern width
│   │   └── wrappers.rs     # Ok/Err/Some/etc. width
│   ├── declarations/       # Module-level formatting
│   ├── comments/           # Comment preservation
│   ├── context/            # FormatContext, column/indent tracking
│   ├── emitter/            # Emitter trait, StringEmitter
│   └── incremental/        # Incremental formatting for LSP
└── tests/
    ├── golden_tests.rs     # Golden file tests
    ├── idempotence_tests.rs # Idempotence verification
    ├── incremental_tests.rs # Incremental formatting tests
    ├── property_tests.rs   # Property-based tests
    └── width_tests.rs      # Width calculation tests

Key Components

ComponentLocationPurpose
SpaceActionspacing/action.rsToken spacing decisions (Layer 1)
TokenCategoryspacing/category.rsAbstract token types for matching (Layer 1)
Packingpacking/strategy.rsContainer packing strategies (Layer 2)
ConstructKindpacking/construct.rsContainer type enumeration (Layer 2)
Shapeshape/core.rsWidth tracking through recursion (Layer 3)
*Rulerules/*.rs8 Ori-specific breaking rules (Layer 4)
Formatterformatter/mod.rsTop-down rendering, orchestration (Layer 5)
WidthCalculatorwidth/mod.rsBottom-up width calculation with caching
FormatContextcontext/Column tracking, indentation, line width checking
ModuleFormatterdeclarations/Module-level formatting (functions, types, etc.)
CommentIndexcomments/Comment association and doc comment reordering
Emitteremitter/Output abstraction (StringEmitter implemented)

Width Calculation

The WidthCalculator Struct

The WidthCalculator performs bottom-up traversal to compute inline width of each expression. Results are cached in an FxHashMap for efficiency.

pub struct WidthCalculator<'a, I: StringLookup> {
    arena: &'a ExprArena,
    interner: &'a I,
    cache: FxHashMap<ExprId, usize>,
}

The ALWAYS_STACKED Sentinel

Some constructs bypass width-based decisions and always use stacked format:

pub const ALWAYS_STACKED: usize = usize::MAX;

When width() returns ALWAYS_STACKED, the formatter skips inline rendering and goes directly to stacked format.

Always-stacked constructs:

  • { } blocks, try (sequential blocks)
  • match (arms always stack)
  • recurse, parallel, spawn, catch
  • nursery

Width Formulas

ConstructWidth Formula
Identifiername.len()
Integer literaldigit count
Float literalformatted string length
String literalcontent.len() + 2 (quotes)
Binary exprleft + op_width + right (3 for spaced ops like +)
Function callfunc + 1 + args_width + 1
Named argumentname + 2 + value (: )
Struct literalname + 3 + fields_width + 2 ({ + })
List2 + items_width ([ + ])
Map2 + entries_width ({ + })
Tuple2 + items_width + trailing_comma

Width Calculation Example

impl<'a, I: StringLookup> WidthCalculator<'a, I> {
    pub fn width(&mut self, expr_id: ExprId) -> usize {
        if let Some(&cached) = self.cache.get(&expr_id) {
            return cached;
        }

        let width = self.calculate_width(expr_id);
        self.cache.insert(expr_id, width);
        width
    }

    fn calculate_width(&mut self, expr_id: ExprId) -> usize {
        let expr = self.arena.get_expr(expr_id);

        match &expr.kind {
            ExprKind::Int(n) => int_width(*n),
            ExprKind::Binary { op, left, right } => {
                let left_w = self.width(*left);
                let right_w = self.width(*right);
                if left_w == ALWAYS_STACKED || right_w == ALWAYS_STACKED {
                    return ALWAYS_STACKED;
                }
                left_w + binary_op_width(*op) + right_w
            }
            ExprKind::Block { .. } => ALWAYS_STACKED,
            // ... other cases
        }
    }
}

Formatting Algorithm

The Formatter Struct

The Formatter wraps a width calculator and format context to produce formatted output:

pub struct Formatter<'a, I: StringLookup> {
    arena: &'a ExprArena,
    interner: &'a I,
    width_calc: WidthCalculator<'a, I>,
    ctx: FormatContext<StringEmitter>,
}

Three Rendering Modes

The formatter uses three rendering modes:

  1. Inline (emit_inline): Single-line, all content on current line
  2. Broken (emit_broken): Multi-line, content breaks according to construct rules
  3. Stacked (emit_stacked): Always multi-line, for blocks/try/match/etc.
pub fn format(&mut self, expr_id: ExprId) {
    let width = self.width_calc.width(expr_id);

    if width == ALWAYS_STACKED {
        self.emit_stacked(expr_id);
    } else if self.ctx.fits(width) {
        self.emit_inline(expr_id);
    } else {
        self.emit_broken(expr_id);
    }
}

Format Context

The FormatContext tracks state during formatting:

pub struct FormatContext<E: Emitter> {
    emitter: E,
    column: usize,        // Current column (0-indexed)
    indent_level: usize,  // Nesting depth
    config: FormatConfig, // Max width, etc.
}

Key methods:

  • fits(width): Returns true if column + width <= max_width
  • emit(text): Emit text and update column
  • emit_newline(): Emit newline and reset column to 0
  • emit_indent(): Emit indentation spaces
  • indent() / dedent(): Adjust indentation level

Breaking Behavior

Binary expressions break before the operator:

fn emit_broken(&mut self, expr_id: ExprId) {
    // ...
    ExprKind::Binary { op, left, right } => {
        self.format(*left);
        self.ctx.emit_newline_indent();
        self.ctx.emit(binary_op_str(*op));
        self.ctx.emit_space();
        self.format(*right);
    }
    // ...
}

Collections have two breaking modes:

  • Simple items (literals, identifiers): wrap multiple per line
  • Complex items (structs, calls, nested): one per line
fn emit_broken_list(&mut self, items: &[ExprId]) {
    let all_simple = items.iter().all(|id| self.is_simple_item(*id));

    if all_simple {
        self.emit_broken_list_wrap(items);     // Multiple per line
    } else {
        self.emit_broken_list_one_per_line(items);
    }
}

Comment Handling

Comment Association

Comments are associated with AST nodes by source position. A comment “belongs to” the node that immediately follows it.

pub struct CommentIndex {
    comments_by_position: BTreeMap<u32, Vec<CommentRef>>,
    consumed: Vec<bool>,
}

Doc Comment Reordering

Doc comments are reordered to canonical order:

  1. // #Description (may span multiple lines)
  2. // @param name (in signature order)
  3. // @field name (in struct order)
  4. // !Warning or // !Error
  5. // >example -> result

Regular comments (//) preserve their original order.

Function Parameter Reordering

For functions, @param comments are reordered to match parameter order:

pub fn take_comments_before_function<I: StringLookup>(
    &mut self,
    pos: u32,
    param_names: &[&str],  // From function signature
    comments: &CommentList,
    interner: &I,
) -> Vec<usize>

Declaration Formatting

Module Structure

ModuleFormatter handles top-level declarations in order:

  1. Imports (stdlib first, then relative)
  2. Constants/configs
  3. Type definitions
  4. Traits
  5. Impls
  6. Functions
  7. Tests

Blank lines separate different declaration kinds.

Function Signatures

Function formatting considers trailing width (return type, capabilities, where clauses) when deciding whether to break parameters:

fn calculate_function_trailing_width(&mut self, func: &Function) -> usize {
    let mut width = 0;
    // Return type: " -> Type"
    if let Some(ref ret_ty) = func.return_ty {
        width += 4 + self.calculate_type_width(ret_ty);
    }
    // Capabilities, where clauses, " = "
    // ...
    width
}

Function Bodies

Function bodies break differently based on construct type:

  • Conditionals break to new line with indent
  • Always-stacked constructs stay on same line, break internally
  • Other constructs break internally as needed

Incremental Formatting

Purpose

Format only declarations that overlap with a changed region, rather than reformatting the entire file. Useful for LSP format-on-type.

API

pub fn format_incremental<I: StringLookup>(
    module: &Module,
    comments: &CommentList,
    arena: &ExprArena,
    interner: &I,
    change_start: usize,
    change_end: usize,
) -> IncrementalResult

Results

  • Regions(Vec<FormattedRegion>): Specific regions to replace
  • FullFormatNeeded: Change affects imports/configs, need full format
  • NoChangeNeeded: Change is between declarations

Limitations

  • Minimum unit is a complete top-level declaration
  • Import and config changes require full format (block-formatted)
  • Multi-declaration changes format all affected declarations

Testing

Test Types

Test TypeLocationPurpose
Golden teststests/golden_tests.rsCompare against expected output
Idempotencetests/idempotence_tests.rsVerify format(format(x)) == format(x)
Property teststests/property_tests.rsRandom input validation
Width teststests/width_tests.rsWidth calculation accuracy
Incrementaltests/incremental_tests.rsIncremental formatting

Idempotence Testing

fn test_idempotence(input: &str) {
    let first = format(input);
    let second = format(&first);
    assert_eq!(first, second);
}

Property-Based Testing

Verifies that formatting preserves semantics by comparing ASTs:

fn test_semantic_preservation(input: &str) {
    let original_ast = parse(input);
    let formatted = format(input);
    let formatted_ast = parse(&formatted);
    assert_eq!(strip_spans(&original_ast), strip_spans(&formatted_ast));
}

Tooling Integration

LSP Server

The LSP server (ori_lsp) provides formatting via textDocument/formatting:

// In ori_lsp
pub fn format_document(source: &str) -> String {
    let (module, comments, arena, interner) = parse(source);
    format_module_with_comments(&module, &comments, &arena, &interner)
}

Playground

The Ori Playground uses format-on-run: code is automatically formatted when the user clicks Run. The LSP compiles to WASM and runs in-browser.

CLI

ori fmt src/         # Format all .ori files in directory
ori fmt file.ori     # Format single file
ori check --fmt      # Check formatting without modifying

Performance

Caching

Width calculations are cached per expression ID:

cache: FxHashMap<ExprId, usize>

Pre-allocation

The formatter pre-allocates based on estimated output size:

pub fn with_capacity(capacity: usize) -> Self {
    Self::with_emitter(StringEmitter::with_capacity(capacity))
}

Output Abstraction

The Emitter trait enables different output targets:

pub trait Emitter {
    fn emit(&mut self, text: &str);
    fn emit_space(&mut self);
    fn emit_newline(&mut self);
    fn emit_indent(&mut self, level: usize);
}

Currently only StringEmitter is implemented. The trait exists to enable future file-streaming output without building full strings in memory.