Arena Allocation

The Ori compiler uses arena allocation for expressions. This document explains the implementation and rationale.

What is Arena Allocation?

Arena allocation (also called “region-based allocation”) allocates objects in a contiguous block of memory. Objects are freed all at once when the arena is dropped, rather than individually.

pub struct ExprArena {
    exprs: Vec<Expr>,
}

The Vec<Expr> is the arena. Expressions are allocated by pushing to the vector and returning an index.

Implementation

ExprId

/// Index into ExprArena
#[derive(Clone, Copy, Eq, PartialEq, Hash, Debug)]
pub struct ExprId(pub u32);

impl ExprId {
    pub const DUMMY: ExprId = ExprId(u32::MAX);

    pub fn index(self) -> usize {
        self.0 as usize
    }
}

ExprId::DUMMY is used as a placeholder during parsing when the actual expression isn’t known yet.

ExprArena

#[derive(Clone, Eq, PartialEq, Hash, Debug, Default)]
pub struct ExprArena {
    exprs: Vec<Expr>,
}

impl ExprArena {
    pub fn new() -> Self {
        Self { exprs: Vec::new() }
    }

    /// Allocate an expression, returning its ID
    pub fn alloc(&mut self, expr: Expr) -> ExprId {
        let id = ExprId(self.exprs.len() as u32);
        self.exprs.push(expr);
        id
    }

    /// Get expression by ID
    pub fn get(&self, id: ExprId) -> &Expr {
        &self.exprs[id.index()]
    }

    /// Get mutable expression by ID
    pub fn get_mut(&mut self, id: ExprId) -> &mut Expr {
        &mut self.exprs[id.index()]
    }

    /// Number of expressions
    pub fn len(&self) -> usize {
        self.exprs.len()
    }

    /// Iterate over all expressions
    pub fn iter(&self) -> impl Iterator<Item = (ExprId, &Expr)> {
        self.exprs
            .iter()
            .enumerate()
            .map(|(i, e)| (ExprId(i as u32), e))
    }
}

Usage During Parsing

impl Parser {
    fn parse_if(&mut self) -> ExprId {
        self.expect(Token::If);
        let condition = self.parse_expr();

        self.expect(Token::Then);
        let then_branch = self.parse_expr();

        let else_branch = if self.check(Token::Else) {
            self.advance();
            Some(self.parse_expr())
        } else {
            None
        };

        // Allocate the If expression
        self.arena.alloc(Expr {
            kind: ExprKind::If {
                condition,
                then_branch,
                else_branch,
            },
            span: self.current_span(),
        })
    }
}

Memory Characteristics

Allocation

  • O(1) amortized - Just a vector push
  • No fragmentation - All expressions contiguous
  • No individual deallocation - Arena freed all at once

Access

  • O(1) lookup - Direct index into vector
  • Good cache locality - Sequential traversal is fast
  • No pointer chasing - IDs are indices, not pointers

Memory Overhead

Per expression:

  • ExprKind: Variable, typically 16-48 bytes
  • Span: 8 bytes
  • No allocation overhead (compared to ~16 bytes for Box)

Comparison with Alternatives

Box

// Box-based
struct Expr {
    kind: ExprKind,
}
enum ExprKind {
    Binary { left: Box<Expr>, right: Box<Expr> },
}

Pros: Simpler code, no arena parameter Cons: Heap allocation per expr, poor locality, not Salsa-compatible

Typed Arenas (typed-arena crate)

// typed-arena based
let arena = Arena::new();
let expr: &Expr = arena.alloc(Expr { ... });

Pros: Returns references, familiar API Cons: References have lifetimes, harder with Salsa

Our Approach (ID-based)

// ID-based
let id: ExprId = arena.alloc(Expr { ... });
let expr: &Expr = arena.get(id);

Pros: IDs are Copy, Salsa-compatible, explicit ownership Cons: Must pass arena around, indirect access

Best Practices

1. Prefer ExprId Over References

// Good: Store IDs
struct Function {
    body: ExprId,
}

// Avoid: Store references (lifetime issues)
struct Function<'a> {
    body: &'a Expr,
}

2. Pass Arena Explicitly

// Good: Explicit arena parameter
fn eval(arena: &ExprArena, id: ExprId) -> Value

// Avoid: Global arena (testing issues)
fn eval(id: ExprId) -> Value

3. Allocate Bottom-Up

When building expressions, allocate children first:

// Correct order for `1 + 2`
let one = arena.alloc(literal(1));     // ID 0
let two = arena.alloc(literal(2));     // ID 1
let add = arena.alloc(binary(one, two)); // ID 2

// Children (0, 1) allocated before parent (2)

4. Use DUMMY Sparingly

// Use for temporary placeholders only
let placeholder = ExprId::DUMMY;
// Then fill in real value later

Parallel Arrays

The arena pattern enables parallel arrays for additional data:

struct TypedModule {
    // Types parallel to ExprArena
    expr_types: Vec<Type>,
}

// Access type for expression
let ty = &typed.expr_types[expr_id.index()];

This keeps the AST immutable while allowing separate type information.

Iteration Patterns

Iterate All Expressions

for (id, expr) in arena.iter() {
    println!("{:?}: {:?}", id, expr.kind);
}

Recursive Traversal

fn visit(arena: &ExprArena, id: ExprId) {
    let expr = arena.get(id);
    match &expr.kind {
        ExprKind::Binary { left, right, .. } => {
            visit(arena, *left);
            visit(arena, *right);
        }
        // ...
    }
}

Collecting IDs

fn collect_literals(arena: &ExprArena, id: ExprId) -> Vec<ExprId> {
    let mut result = Vec::new();
    collect_literals_inner(arena, id, &mut result);
    result
}