ARC System Overview
What Is Automatic Reference Counting?
Every programming language must answer a fundamental question: when a program allocates memory, who decides when to free it? The history of memory management is a history of different answers to this question, each with different tradeoffs between safety, performance, and programmer effort.
Manual memory management puts the burden on the programmer. C requires explicit malloc and free calls; the programmer tracks every allocation’s lifetime and frees it at the right time. This gives maximum control and zero runtime overhead, but it is the single largest source of security vulnerabilities in software history — use-after-free, double-free, and memory leaks account for roughly 70% of critical bugs in large C and C++ codebases (Microsoft Security Response Center, 2019; Google Project Zero data).
Tracing garbage collection automates the problem by periodically scanning the heap to find unreachable objects and freeing them. Java, Go, Python (as a backup), JavaScript, Haskell (GHC), OCaml, and most managed languages use this approach. It eliminates use-after-free entirely, but introduces pause-time unpredictability (stop-the-world collectors), memory overhead (the collector needs headroom to operate efficiently), and throughput cost (typically 5-15% for generational collectors, more for concurrent ones). Languages with GC also tend to allocate more freely, since the cost of allocation is amortized across collection cycles.
Ownership and borrowing is Rust’s answer: a static type system tracks which variable owns each allocation, and the compiler inserts drop calls at the end of each owner’s scope. Borrows (&T, &mut T) allow temporary access without ownership transfer. This achieves manual-memory-management performance with compile-time safety, but it requires the programmer to think about lifetimes and satisfy the borrow checker — a significant learning curve and a constraint on program architecture (cyclic data structures, self-referential types, and certain patterns require unsafe or smart pointers).
Reference counting tracks how many references point to each allocation. When a reference is created, the count increments; when a reference is destroyed, it decrements; when the count reaches zero, the allocation is freed. Objective-C (manual retain/release), Swift (ARC), Python (primary collector), and Perl use this approach. Reference counting has deterministic destruction (objects are freed immediately when unreachable, not at the next GC pause), low memory overhead (no collector headroom), and predictable latency. But naive reference counting has three significant costs: every reference copy requires an atomic increment, every reference destruction requires an atomic decrement (often followed by a branch to check for zero), and cycles require a separate mechanism.
Automatic Reference Counting (ARC) is reference counting where the compiler — not the programmer — inserts the increment and decrement operations. The programmer writes code as if memory management doesn’t exist; the compiler analyzes the program’s data flow and places RC operations at precisely the right points. Swift popularized this term, but the concept predates it — Objective-C’s ARC (introduced in 2011) was the first widely-deployed automatic RC system, and Lean 4’s compiler (2020+) showed that aggressive RC optimization could make functional languages competitive with imperative ones.
Where Ori Sits
Ori uses value semantics — every assignment is a logical copy, every variable owns its data, there is no shared mutable state. This is the programming model of pure functional languages and of Swift’s value types. It is the simplest model for programmers to reason about: there are no aliasing bugs, no data races, no spooky action at a distance through shared references.
But value semantics, naively implemented, is catastrophically expensive. Every list push copies the entire list. Every struct update allocates a fresh struct. Every function call that passes a collection duplicates it. A language with value semantics needs an aggressive optimization pipeline to make it practical.
Ori’s ARC system is that pipeline. It is a compiler pass (ori_arc crate) that transforms canonical IR into memory-managed code with explicit reference counting, then systematically eliminates the overhead that value semantics would normally impose. The pipeline is backend-independent — it has no LLVM dependency and produces an IR that any backend can consume. The LLVM backend’s arc_emitter translates ARC IR to LLVM IR, but the ARC system itself knows nothing about LLVM.
ARC IR is the sole codegen path — all native code generation flows through ARC IR.
What Makes Ori’s ARC System Distinctive
Eight Stacked Optimizations, Not One
Most ARC implementations apply a single optimization strategy. Swift’s optimizer focuses primarily on removing redundant retain/release pairs through dataflow analysis. Python’s reference counting has no compile-time optimization at all — every reference operation is emitted naively. Ori stacks eight optimization layers that compound on each other, where each layer creates opportunities for the next:
- Type classification eliminates RC for half of all variables (scalars)
- Borrow inference eliminates RC at call sites for read-only parameters
- Perceus insertion places RC at precise last-use points (not scope boundaries)
- Reset/reuse converts drop-then-allocate sequences into in-place reuse
- RC identity normalization enables elimination across field projections
- RC elimination removes canceling inc/dec pairs
- COW runtime provides O(1) mutation for uniquely-owned collections
- Static uniqueness analysis proves ownership at compile time, eliminating runtime checks
Functional Semantics, Imperative Performance
The ARC system’s goal is not just correctness — it is to make pure value semantics compile to the same machine code an imperative programmer would write by hand. A list map function that walks a linked list, pattern-matching each node and constructing a new one, should reuse every node allocation in-place. A loop that pushes elements onto a list should mutate the list’s buffer directly, not copy it on every iteration. The programmer writes functional code; the compiler generates imperative performance.
Backend Independence
The ARC pipeline produces ArcFunction values — basic-block IR with explicit RC operations, ownership annotations, and reuse tokens. This IR is consumed by ori_llvm’s arc_emitter, but the design deliberately keeps ori_arc free of any backend dependency. A future WASM backend, a cranelift backend, or even an interpreter that operates on ARC IR could consume the same output without modification.
Interprocedural Analysis
Most RC optimizers work within a single function. Ori’s borrow inference and uniqueness analysis are interprocedural — they analyze the entire module’s call graph, decompose it into strongly connected components, and iterate to fixpoint. A function that only reads its list parameter gets that parameter marked as Borrowed across the entire program, eliminating RC operations at every call site. This is the difference between optimizing locally and optimizing globally.
How the Pipeline Eliminates Each Cost
1. Type Classification — Skip RC for Value Types
Every type is classified as Scalar (int, bool, small structs — no heap pointer, no RC needed), DefiniteRef (str, collections, closures — always heap-allocated, always needs RC), or PossibleRef (unresolved generics — conservative fallback). Classification is monomorphized: Option<int> is Scalar (tag + int, no heap pointer), Option<str> is DefiniteRef. Roughly half of all variables in a typical program need zero RC operations.
See ARC IR for the classification system.
2. Borrow Inference — Eliminate RC at Call Sites
A global SCC-based fixed-point analysis classifies each function parameter as Borrowed (callee only reads it) or Owned (callee may store, return, or transfer it). Borrowed parameters skip RC entirely at the call site — no increment on call, no decrement on return. The result: len(list) borrows its argument, and the caller never touches the refcount. Without borrow inference, every len() call costs two atomic operations.
See Borrow Inference for the algorithm.
3. Perceus RC Insertion — Place RC at Last-Use Points
Once ownership is known, RC operations are placed mechanically via backward liveness analysis. Each variable gets an RcDec at its last use and an RcInc at each additional use beyond the first. This is the Perceus algorithm (Reinking et al., 2021), originally developed for the Koka language. The precision matters because it sets up later passes — conservative placement would hide optimization opportunities.
See RC Insertion for the Perceus algorithm.
4. Reset/Reuse — Turn Free+Alloc into In-Place Mutation
When an RcDec (about to free a value) is followed by a Construct of the same type (about to allocate), the pipeline replaces both with a reset/reuse pair. At runtime, if the old value is uniquely owned (RC == 1), the allocation is reused in-place — no free, no alloc, no copy. If shared (RC > 1), the slow path decrements and allocates fresh. This is what makes functional pattern matching competitive with imperative mutation.
See Reset/Reuse for detection and expansion.
5. RC Identity Normalization — Enable Elimination Across Projections
When v1 = Project(v0, field), both v1 and v0 share the same RC identity — incrementing v1 is the same as incrementing v0. The identity pass normalizes all RC operations to target canonical roots, so RcInc(v1) followed by RcDec(v0) can be recognized as a canceling pair. Without this, struct field manipulation leaves orphaned RC operations.
6. RC Elimination — Remove Redundant Pairs
A bidirectional intra-block dataflow pass finds RcInc(x) / RcDec(x) pairs with no intervening use and removes both. Cross-block elimination handles pairs split across basic block boundaries along single-predecessor edges. The pass cascades until fixpoint — removing one pair can expose another.
See RC Elimination.
7. COW Collections — O(1) Mutation for Unique Owners
At runtime, every collection mutation (list push, map insert, set remove) checks ori_rc_is_unique(). If the collection has RC == 1, it mutates in-place. If shared (RC > 1), it copies the buffer and decrements the old ref. This is what makes value semantics practical for collections — without COW, list.push(x) in a loop is O(n^2).
See Collections & COW.
8. Static Uniqueness Analysis — Eliminate Runtime Checks
The final layer proves at compile time that certain values are uniquely owned, allowing codegen to emit only the fast path — no IsShared check, no branch, no slow-path code. The analysis uses a forward dataflow lattice (Unique / MaybeShared / Shared) with the key insight that COW operations always produce unique results.
The Compounding Effect
Each technique handles a different dimension of the problem, and they multiply:
| Without | Cost | With | Residual |
|---|---|---|---|
| Classification | RC ops on ints, bools | Classification | Zero RC on ~50% of variables |
| Borrow inference | RC at every call site | + Borrow inference | Zero RC for read-only params |
| Precise insertion | RC at scope boundaries | + Perceus | RC only at true last-use |
| Reset/reuse | alloc+free per constructor | + Reset/reuse | In-place reuse when unique |
| RC elimination | Redundant inc/dec pairs | + Elimination | Minimal surviving RC ops |
| Identity normalization | Orphaned ops on projections | + Normalization | No leaked projection RC |
| COW runtime | O(n) per collection mutation | + COW | O(1) when unique |
| Static uniqueness | Branch per mutation | + Uniqueness | Branchless fast path |
The net result: a program written in pure value semantics — no mutation syntax, no ownership annotations, no lifetime parameters — compiles to code that mutates in-place, reuses allocations, skips RC on scalars and borrowed params, and eliminates branches on provably-unique values.
Architecture
Pipeline Position
The ARC system sits at the fork point between interpretation and native compilation. Both paths consume the same canonical IR; the ARC system is the entry point for the native code path:
flowchart LR
Source["Source .ori"] --> Lex["Lex"]
Lex --> Parse["Parse"]
Parse --> TypeCheck["Type Check"]
TypeCheck --> Canon["Canonicalize
CanExpr"]
Canon --> Eval["ori_eval
Tree-walking"]
Canon --> ARC["ori_arc
ARC Pipeline"]
ARC --> LLVM["ori_llvm
arc_emitter"]
classDef frontend fill:#1e3a5f,stroke:#60a5fa,color:#dbeafe
classDef canon fill:#3b1f6e,stroke:#a78bfa,color:#e9d5ff
classDef interpreter fill:#1a4731,stroke:#34d399,color:#d1fae5
classDef native fill:#5c3a1e,stroke:#f59e0b,color:#fef3c7
class Source,Lex,Parse,TypeCheck frontend
class Canon canon
class Eval interpreter
class ARC,LLVM native
Pipeline Passes
The ARC pipeline runs in a strict, load-bearing order — each pass depends on the output of prior passes. Reordering or skipping passes produces incorrect code.
flowchart TB
Lower["Lower CanExpr
→ ArcFunction"] --> Global["Global Analysis"]
subgraph Global["Global Passes (per-module)"]
Borrow["1. Borrow Inference
SCC-based ownership"]
Apply["2. Apply Borrows
Annotate params"]
Borrow --> Apply
end
subgraph Unique["Interprocedural Analysis"]
UA["Uniqueness Analysis
COW elimination"]
end
Global --> Unique
Unique --> PerFn
subgraph PerFn["Per-Function Pipeline"]
VarRepr["1. Compute Var Reprs"]
DerivedOwn["2. Derived Ownership"]
Live1["3. Refined Liveness"]
COW["4. COW Annotations"]
RCIns["5. RC Insertion (Perceus)"]
DomTree["6. Dominator Trees"]
Live2["7. Re-Liveness"]
Detect["8. Reset/Reuse Detection"]
Expand["9. Reuse Expansion"]
Identity["10. RC Identity"]
Elim["11. RC Elimination"]
Hints["12. Drop Hints"]
VarRepr --> DerivedOwn --> Live1 --> COW --> RCIns
RCIns --> DomTree --> Live2 --> Detect --> Expand
Expand --> Identity --> Elim --> Hints
end
classDef native fill:#5c3a1e,stroke:#f59e0b,color:#fef3c7
class Lower,Borrow,Apply,UA native
class VarRepr,DerivedOwn,Live1,COW,RCIns,DomTree,Live2,Detect,Expand,Identity,Elim,Hints native
Each dependency is structural, not incidental:
- Borrow inference needs classification (to skip scalars)
- RC insertion needs borrow ownership (to skip borrowed params) and liveness (to find last-use points)
- Reset/reuse needs RC insertion (to find RcDec/Construct pairs)
- Reuse expansion needs detection (to know what to expand)
- RC elimination needs identity normalization (to match projected pairs)
- Static uniqueness needs COW paths (to prove results are unique)
Entry Points
| Function | Purpose |
|---|---|
run_arc_pipeline() | Single function — assumes borrow sigs cached, runs per-function passes |
run_arc_pipeline_all() | Batch — infers borrows globally, then runs per-function pipeline on each |
run_uniqueness_analysis() | Interprocedural uniqueness — SCC-based fixpoint for COW elimination |
Consumers always use these entry points, never manual pass sequencing.
Prior Art
Lean 4 is the most direct influence on Ori’s ARC system. Lean 4’s compiler (described in Counting Immutable Beans, Ullrich and de Moura, 2019) pioneered several techniques that Ori adopts: the three-way type classification (isScalar / isDefiniteRef / isPossibleRef), borrow inference for function parameters, reset/reuse for in-place allocation recycling, and the overall strategy of stacking RC optimizations into a pipeline. Lean’s implementation lives in src/Lean/Compiler/IR/RC.lean (RC insertion), Borrow.lean (borrow inference), and ExpandResetReuse.lean (reuse expansion). Ori’s pipeline follows Lean’s pass ordering closely, with additions for COW collection optimization and static uniqueness analysis that Lean does not perform.
Koka contributed the Perceus algorithm (Reinking, Xie, de Moura, Leijen, 2021) — the precise, liveness-based RC insertion strategy that Ori uses. Koka also introduced FBIP (Functional But In-Place, Lorenzen, Leijen, et al., 2023) — the idea that functional programs can be analyzed to determine whether all allocations are reusable, enabling guaranteed in-place mutation. Ori implements FBIP checking as a diagnostic pass (fbip/) that verifies reuse goals were achieved. Koka’s borrow analysis (src/Core/Borrowed.hs) and FBIP checker (src/Core/CheckFBIP.hs) are reference implementations.
Swift popularized ARC as a mainstream language feature and developed sophisticated ARC optimization passes in its SIL (Swift Intermediate Language) optimizer. Swift’s approach differs from Ori’s: Swift uses bidirectional dataflow analysis to find matching retain/release pairs (lib/SILOptimizer/ARC/), while Ori uses Perceus-style liveness-based insertion followed by elimination. Swift also does not perform borrow inference at the IR level — Swift’s ownership model is expressed in the source language through move semantics and borrowing annotations, while Ori infers ownership automatically.
Rust solves the same problem through a fundamentally different mechanism: ownership and borrowing are part of the type system, enforced by the borrow checker. Rust never uses reference counting for owned values (only for explicitly opt-in Rc<T> / Arc<T>). The tradeoff is well-known: Rust gets zero-overhead memory management, but programmers must satisfy the borrow checker. Ori chooses the opposite point in the design space — no borrow checker, no lifetime annotations, with the compiler inferring where RC operations can be eliminated.
CPython uses naive reference counting with a cycle collector backup. Every reference copy increments, every destruction decrements, with no compile-time optimization. This is the baseline that illustrates why optimization matters — CPython’s RC overhead is substantial, and its cycle collector adds GC pauses on top of it. Ori’s pipeline represents the opposite extreme: aggressive compile-time analysis to minimize the runtime RC operations that CPython performs unconditionally.
Design Tradeoffs
ARC vs tracing GC. Ori chose ARC over tracing garbage collection for three reasons: deterministic destruction (values are freed immediately when their last reference disappears, enabling predictable resource cleanup), low latency (no stop-the-world pauses), and compatibility with value semantics (the RC provides a natural uniqueness check for COW optimization). The cost is that ARC cannot collect reference cycles — Ori’s value semantics and lack of shared mutable references prevent cycles in normal code, but future features (e.g., graph data structures) may need a cycle-breaking mechanism.
ARC vs ownership types. Ori chose ARC over Rust-style ownership and borrowing to reduce programmer burden. The tradeoff is that ARC has runtime overhead (atomic increment/decrement operations on reference counts) where Rust has zero, and ARC requires a more complex compiler pipeline. Ori’s position is that the optimization pipeline can eliminate enough overhead to make the difference negligible for most programs, while providing a simpler programming model.
Interprocedural vs local analysis. Borrow inference and uniqueness analysis are interprocedural — they analyze the entire module. This produces better results (more parameters marked as Borrowed, more values proven Unique) but has higher compile-time cost and implementation complexity. A per-function analysis would be simpler but would miss cross-function optimization opportunities. The interprocedural approach is justified because borrow inference convergence is fast in practice (SCC decomposition bounds the iteration count), and the RC elimination it enables is worth the compile-time cost.
Backend-independent ARC IR vs backend-specific optimization. The ARC pipeline produces backend-independent IR, which means backend-specific optimizations (like LLVM’s own dead-code elimination or constant propagation) cannot feed back into ARC decisions. The alternative — embedding ARC decisions in the backend — would enable more aggressive optimization but would duplicate the pipeline for each backend. Backend independence was chosen because Ori targets multiple backends (LLVM for native, WASM in the future), and the ARC optimizations are algorithmic (dataflow-based) rather than target-specific.
Perceus vs scope-based RC. Most ARC implementations insert RC at scope boundaries — increment when entering a scope, decrement when leaving. Perceus inserts at the precise point of last use, which is more aggressive but requires liveness analysis. The benefit is that later passes (reset/reuse, elimination) have more information to work with. The cost is an extra analysis pass and more complex insertion logic.
Debugging
| Tool | Command |
|---|---|
| Tracing | ORI_LOG=ori_arc=debug ori build file.ori |
| Verbose tracing | ORI_LOG=ori_arc=trace ORI_LOG_TREE=1 ori build file.ori |
| ARC IR dump | ORI_DUMP_AFTER_ARC=1 ori build file.ori |
| RC trace (runtime) | ORI_TRACE_RC=1 ./binary |
| Leak check | ORI_CHECK_LEAKS=1 ./binary |
| RC balance | diagnostics/rc-stats.sh file.ori |
| Codegen audit | diagnostics/codegen-audit.sh --strict file.ori |
| Full diagnosis | diagnostics/diagnose-aot.sh --valgrind file.ori |
| Interpreter vs AOT | diagnostics/dual-exec-debug.sh file.ori |
Related Documents
- ARC IR — IR definitions, type classification, value representations
- Lowering — CanExpr to ARC IR conversion
- Borrow Inference — Global ownership inference
- Liveness — Backward dataflow analysis
- RC Insertion — Perceus algorithm
- Reset/Reuse — In-place constructor reuse
- RC Elimination — Redundant pair removal
- Drop Descriptors — Per-type drop generation
- Decision Trees — Pattern compilation in ARC IR
- ARC Emitter — ARC IR to LLVM IR translation
- Runtime RC — Runtime RC implementation