Builtins Codegen

Why Inline Codegen for Built-in Methods?

When a programmer writes list.len(), the simplest implementation would be to compile it as a function call: emit a call to ori_list_len(list), let the runtime extract the length field, and return it. This is correct, but it is also wasteful — list.len() is a single field extraction from a known struct layout. The “function call” crosses the compiler-runtime boundary, pushes arguments, saves registers, and returns through the call stack, all to read one integer from a fixed offset.

The alternative is inline codegen: instead of calling a function, the compiler emits the LLVM instructions directly. For list.len(), this means a single extractvalue instruction that pulls the length field from the list’s { i64, i64, ptr } representation. No function call, no ABI overhead, no register spilling. The result is the same, but the generated code is dramatically faster.

This trade-off — inline code generation versus runtime function calls — applies to hundreds of built-in methods across all of Ori’s types. Some methods are simple enough to inline entirely (.len(), .is_empty(), .is_some()). Others require complex logic that belongs in the runtime (.split(), .sort(), .format()). The builtins codegen system manages this decision for every built-in method in the language.

Classical Approaches

Everything as runtime calls (the CPython approach) — every operation, including len(), +, and [], dispatches through the runtime. This is simple to implement but slow. CPython’s method dispatch goes through PyObject_GetAttr, hash table lookups, and descriptor protocol handling for every single method call.

Everything inlined (the early C compiler approach) — the compiler knows about every built-in type and operation, generating specialized code for each. This produces fast code but makes the compiler monolithic — adding a new type or method requires modifying the compiler’s codegen.

Intrinsics (the LLVM/GCC approach) — certain functions are recognized by the compiler and replaced with specialized instruction sequences. LLVM’s @llvm.memcpy, @llvm.sqrt, and @llvm.ctpop are examples. The compiler pattern-matches on function names and replaces them with target-specific instruction sequences.

Tiered dispatch (the V8/JavaScriptCore approach) — hot methods get specialized inline implementations, warm methods use runtime calls, and cold methods use generic dispatch. The tier boundaries are determined by profiling data.

Where Ori Sits

Ori uses a declarative registration system where each built-in method is registered with a handler function that emits LLVM IR. Simple methods emit inline instructions; complex methods emit calls to runtime functions. The declare_builtins! macro generates both the dispatch logic and the registration table from a single declaration, guaranteeing that registration and dispatch stay synchronized.

What Makes Ori’s Builtins System Distinctive

Macro-Generated Dual Artifacts

A single declare_builtins! invocation in each submodule generates two artifacts simultaneously:

  1. dispatch(emitter, ctx) -> Option<ValueId> — a match cascade on (type_name, method) pairs that routes to the correct handler. Returns Some(value) if the builtin was handled, None if it should fall through to normal method dispatch.

  2. REGISTERED: &[BuiltinRegistration] — an enumerable list of all registered builtins, used for sync tests and the BuiltinTable.

This dual-generation design makes an entire class of bugs impossible: you cannot register a builtin without providing a dispatch handler, and you cannot dispatch to an unregistered builtin. The macro is the single source of truth for what builtins exist and how they are dispatched.

Sync Testing Against the Type Checker

The type checker maintains its own list of built-in methods (TYPECK_BUILTIN_METHODS) — the methods it allows to be called on built-in types without requiring an explicit impl block. The builtins codegen system maintains its own list of methods it can handle. If these lists drift — the type checker allows str.reverse() but codegen doesn’t handle it, or codegen handles int.clamp() but the type checker doesn’t allow it — the program will either fail at codegen (missing handler) or silently fall through to runtime dispatch (unnecessary overhead).

Sync tests automatically iterate both lists and verify they match. Adding a new built-in method to one side without the other triggers a test failure. The TYPECK_BUILTIN_METHODS list must be sorted alphabetically by (type, method) — a constraint enforced by test — which prevents accidental duplicates and makes the list easy to search.

Receiver Borrowing Metadata

Each builtin registration includes a receiver_borrowed flag that the ARC emitter uses to decide whether to increment the receiver’s reference count before the call. Borrowed receivers (.len(), .is_empty(), .contains()) skip the increment — the method only reads the receiver, so there is no need to extend its lifetime. Owned receivers (methods that consume or modify the receiver) get the normal RC treatment.

This per-method borrowing information is not available from the type system alone — it is a codegen-level optimization that depends on the method’s implementation, not its signature.

BuiltinCtx

Every handler function receives a BuiltinCtx containing everything needed to emit correct LLVM IR:

FieldTypePurpose
type_name&strOri type being dispatched on (e.g., "list", "str")
method&strMethod name being called (e.g., "len", "push")
arg_vals&[ValueId]LLVM values for all arguments; receiver is arg_vals[0]
receiver_tyIdxPool type index for the receiver
type_info&TypeInfoFull type information (inner types, field layout)
arc_args&[ArcArg]ARC IR argument metadata (for type lookups)
arc_func&ArcFunctionEnclosing ARC function (for additional type context)
result_tyIdxPool type index for the expected return type

Handlers return Option<ValueId>: Some(value) for successfully emitted IR, None to signal fallthrough to normal method dispatch.

Built-in Method Coverage

The builtins system covers every type in the language. Each submodule handles a coherent group of types and methods.

Primitives

Numeric, boolean, character, byte, Duration, Size, and Ordering methods. Most emit 1–3 LLVM instructions.

TypeMethodsTypical Emission
intclone, abs, to_float, min, max, clamp, pow, is_positive, is_negative, is_zeroicmp + select for abs; sitofp for to_float
floatclone, abs, floor, ceil, round, to_int, sqrt, is_nan, is_infinite, is_finite, min, maxLLVM intrinsics: @llvm.fabs, @llvm.floor, @llvm.sqrt
boolclone, to_intzext i1 to i64 for to_int
charclone, to_int, to_str, is_alphabetic, is_numeric, is_whitespace, to_upper, to_lowerzext i32 to i64 for to_int; runtime calls for Unicode
byteclone, to_int, to_charZero-extension instructions
Durationclone, nanoseconds, microseconds, milliseconds, seconds, minutes, hoursDivision by duration constants
Sizeclone, bytes, kilobytes, megabytes, gigabytes, terabytesDivision by size constants
Orderingclone, is_less, is_equal, is_greater, reverse, then, then_withicmp eq i8 %v, -1 for is_less

int.clone() and bool.clone() are identity operations — the value is already in a register, and “copying” it means returning the same register. These exist so that generic code that calls .clone() on any T: Clone works uniformly for primitives.

Strings

String methods balance inline emission and runtime calls. Layout access (length, empty check) is inlined; content manipulation (split, trim, case conversion) delegates to the runtime because UTF-8 string processing requires non-trivial logic.

PatternMethodsImplementation
Layout accesslen, is_emptyextractvalue from { i64, ptr }
Identityto_strReturn receiver unchanged
Content queriescontains, starts_with, ends_withRuntime: ori_str_contains, etc.
Transformationsplit, trim, upper, lowerRuntime: returns new strings
Concatenation+ operatorori_str_concat_sso (SSO-aware)
DebugdebugRuntime: adds quote escaping
Iterationiterori_iter_from_str

Lists

List operations span the full range from simple inline code to complex runtime interactions:

PatternMethodsImplementation
Layout accesslen, is_emptyextractvalue from { i64, i64, ptr }
Element accessfirst, lastBounds check + GEP to element
SearchcontainsInline linear scan with equality check loop
Orderreverse, sortRuntime calls; sort uses comparison thunks
COW mutationspush, pop, insert, remove, setRuntime COW functions (see below)
Iterationiterori_iter_from_list
Deep copycloneori_list_clone + element RC inc loop

COW List Operations

The Copy-on-Write list operations are the most complex builtins. Each COW function receives the list’s components (data pointer, length, capacity), the element to operate on, the element size, and element RC callbacks — function pointers that know how to increment or decrement the reference count of a single element.

MethodRuntime FunctionElement Callbacks
pushori_list_push_cowinc + dec
popori_list_pop_cowdec only
setori_list_set_cowinc + dec
insertori_list_insert_cowinc + dec
removeori_list_remove_cowdec only
concatori_list_concat_cowinc + dec

The element callbacks are generated by element_fn_gen.rs — one extern "C" fn(*mut u8) for increment, one for decrement, per element type. For [int], both callbacks are null (integers need no RC). For [str], the callbacks increment/decrement the string’s data pointer. For [[int]], the callbacks handle the nested list’s RC. These are cached per type to avoid regeneration.

Maps and Sets

Map and set operations follow the same pattern as lists — layout access is inlined, mutations use COW runtime functions with key/value RC callbacks:

TypeMethods
maplen, is_empty, get, contains_key, insert, remove, entries, keys, values, iter
Setlen, is_empty, contains, insert, remove, union, intersection, difference, iter

Map operations additionally pass key hash and equality callbacks for type-generic hashing — the runtime’s hash map implementation doesn’t know how to hash an arbitrary key type, so the compiler generates key-specific hash and equality functions.

Option and Result

Option and Result methods are almost entirely inlined, leveraging the { i8, T } tagged representation:

PatternMethodsImplementation
Tag checkis_some, is_none, is_ok, is_errSingle icmp on tag byte
Unwrapunwrap, unwrap_orBranch on tag, extract payload or use default
Transformmap, and_then, filterBranch on tag, call closure in Some/Ok arm
Convertok_or, ok, errConstruct new tagged value
ContextcontextWrap error with context string

is_some/is_none emit exactly one LLVM instruction: icmp eq i8 %tag, 1 (or 0). These are among the simplest builtins in the system.

Iterators

Iterator methods split into adapters (which construct new iterator representations) and consumers (which drive the iteration loop):

PatternMethodsImplementation
Adaptersmap, filter, take, skip, enumerate, zip, chain, flatten, flat_map, cycleConstruct runtime iterator adapter
Consumerscount, any, all, find, fold, for_each, collectInline loop with early exit
NextnextRuntime dispatch on iterator variant
Double-endedrev, last, rfind, rfoldRuntime adapter or consumer

Consumer methods emit inline loops rather than function calls. A list.iter().count() emits a loop that calls ori_iter_next repeatedly, counting non-None results, with early exit optimization for iterators with known size hints.

Traits

Trait method dispatch for comparison, equality, and hashing delegates to type-specific implementations:

TraitMethodDispatch
EqequalsType-specific: ori_str_eq for strings, icmp for ints
ComparablecompareType-specific: ori_compare_int, string comparison
HashablehashType-specific hash computation

Compound Types

to_str and debug for structs, enums, and tuples generate inline IR that:

  1. Allocates a string buffer
  2. Appends the type/variant name
  3. Iterates fields, calling each field’s to_str/debug recursively
  4. Joins with commas and wraps in appropriate delimiters

Prelude Functions

Prelude functions (print, assert, assert_eq, panic, dbg, etc.) are not methods but use the same builtin infrastructure. They emit calls to runtime functions (ori_print, ori_assert, ori_panic) with type-specific argument preparation.

BuiltinTable

The BuiltinTable provides O(1) lookup for builtin existence and metadata. It is a two-level FxHashMap: type name → method name → BuiltinRegistration. Built once from REGISTERED at module initialization, it serves three purposes:

  1. Early rejection — before attempting builtin dispatch, check if (type, method) is registered. This avoids entering the match cascade for methods that will definitely fall through.

  2. Sync tests — test infrastructure iterates the table to verify parity with TYPECK_BUILTIN_METHODS.

  3. Receiver metadata — the receiver_borrowed flag drives ARC behavior at call sites.

Adding a New Built-in Method

Adding a new built-in method (for example, str.repeat) requires exactly four changes:

  1. Implement the handler in the appropriate submodule. The handler receives a BuiltinCtx and returns Option<ValueId>.

  2. Register in declare_builtins! within the same submodule: ("str", "repeat", emit_str_repeat, true). The macro handles dispatch routing and table registration.

  3. Add to TYPECK_BUILTIN_METHODS in the type checker. Must be alphabetically sorted by (type, method).

  4. Add runtime function (if needed) to RT_FUNCTIONS in runtime_functions.rs and implement it in ori_rt.

No other files need modification. The sync tests verify that all registrations match. The macro handles dispatch and table population.

Prior Art

rustc — Rust’s codegen does not have a comparable “builtins” system because Rust’s built-in methods are implemented as actual trait impls (e.g., impl Add for i32) that go through normal monomorphization and codegen. LLVM intrinsics are used for specific operations (@llvm.ctpop for count_ones, @llvm.fabs for f64::abs), but there is no dispatch table for method-level builtin handling. Ori’s approach is necessary because Ori’s interpreter and LLVM backend share a language-level method interface that doesn’t map directly to trait impls.

V8 (JavaScript) — V8’s “Torque” language defines built-in functions in a domain-specific language that generates both the interpreter bytecode handlers and the optimizing compiler’s inline code. This is structurally similar to Ori’s declare_builtins! macro — a single source generates both dispatch and implementation. V8’s approach is more sophisticated (Torque generates multiple tiers of code), but the principle of single-source builtin definition is the same.

GHC (Haskell) — GHC’s “primops” are built-in operations that bypass normal function call overhead. Each primop has a codegen handler that emits specialized assembly or LLVM IR. GHC’s primops.txt.pp file serves a similar role to Ori’s declare_builtins! — it is a single source that generates both the type checker’s knowledge of primops and the codegen handlers.

Zig — Zig’s built-in functions (@addWithOverflow, @memcpy, @typeInfo) are handled directly by the compiler’s codegen. Each builtin has a dedicated handler in Sema.zig or codegen/llvm.zig. Zig’s approach is simpler (no macro or registration table) but less systematic — adding a new builtin requires editing multiple files manually.

Design Tradeoffs

Macro registration vs. manual dispatch. The declare_builtins! macro adds compile-time complexity (macro expansion, generated code) in exchange for eliminating registration drift. A manual approach — separate dispatch function and registration array — would be simpler to read but would require developers to update two places when adding a builtin. The macro’s enforcement guarantee is worth the abstraction cost.

Inline codegen vs. runtime calls. Each builtin handler must decide whether to emit inline instructions or call a runtime function. The heuristic is: if the operation can be expressed in 1–5 LLVM instructions with no loops, inline it. If it requires loops, heap allocation, or complex logic (UTF-8 string processing, hash table operations), call the runtime. The boundary is fuzzy — contains on lists uses an inline loop (it’s a performance-critical path), while sort uses a runtime call (the sorting algorithm is complex).

Per-method borrowing vs. type-level borrowing. The receiver_borrowed flag is per-method, not per-type. An alternative — marking entire types as “always borrowed for reads” — would be simpler but incorrect for methods like clone() that need ownership semantics even though they only “read” the value. Per-method granularity trades registration verbosity for correct RC behavior.

Single BuiltinCtx vs. method-specific contexts. All handlers receive the same BuiltinCtx, even though some handlers use only a few fields. A method-specific context (different struct for collection methods vs. primitive methods) would be more precise but would fragment the API and make the macro more complex. The uniform context trades a few unused fields for API simplicity.