100%

Section 02: List COW Operations

Context: Currently, ori_list_push_new() and all other list mutation functions in ori_rt unconditionally allocate a new buffer, copy all elements, perform the mutation, and return the new list. This makes every push O(n). For a loop that builds a list of N elements, the total cost is O(N²). With COW, the same loop is O(N) — identical to a mutable Vec<T> in Rust.

Reference implementations:

  • Swift Array.swift: _makeMutableAndUnique()_buffer.beginCOWMutation() → branch. Inline fast path, outlined slow path.
  • Roc list.zig: listAppend(list, elem, update_mode)update_mode is either InPlace or Immutable, determined by the compiler’s alias analysis.
  • Lean 4: Reset/Reuse pattern — if RC==1, reuse memory; if shared, allocate fresh.

Depends on: Section 01 (uniqueness check API, capacity management, growth strategy).


02.1 COW Push (Append)

File(s): compiler/ori_rt/src/lib.rs

Push is the highest-impact single operation. Most list construction patterns are repeated appends.

  • Replace ori_list_push_new with COW-aware ori_list_push: (2026-02-27, implemented as ori_list_push_cow with consuming semantics — sret ABI matching current codegen pattern. Uses RC-allocated data buffers. Fast path: unique+capacity=in-place O(1), unique+growth=realloc. Slow path: shared/empty=allocate+copy+dec-old. Codegen wiring deferred to §02.7)

    /// Appends `elem` to `list`. If the list's data buffer is uniquely owned
    /// and has sufficient capacity, writes in place (O(1)). Otherwise,
    /// allocates a new buffer with grown capacity, copies old elements,
    /// writes the new element, and decrements the old buffer's RC.
    ///
    /// Returns the (possibly new) OriList by value.
    ///
    /// Element RC: The element's RC is NOT incremented — the caller is
    /// responsible for ensuring the element is owned (Perceus handles this).
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_push(
        list: OriList,
        elem: *const u8,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList {
        let new_len = list.len + 1;
    
        if !list.data.is_null() && ori_rc_is_unique(list.data as *const u8) {
            // FAST PATH: unique owner
            if list.cap >= new_len {
                // Has capacity — write in place
                unsafe {
                    let dst = list.data.add((list.len as usize) * elem_size);
                    std::ptr::copy_nonoverlapping(elem, dst, elem_size);
                }
                return OriList { len: new_len, cap: list.cap, data: list.data };
            } else {
                // Needs growth — realloc (may extend in place)
                let new_cap = next_capacity(list.cap as usize, new_len as usize);
                let new_data = ori_rc_realloc(
                    list.data,
                    (list.cap as usize) * elem_size,
                    new_cap * elem_size,
                    elem_align,
                );
                unsafe {
                    let dst = new_data.add((list.len as usize) * elem_size);
                    std::ptr::copy_nonoverlapping(elem, dst, elem_size);
                }
                return OriList { len: new_len, cap: new_cap as i64, data: new_data };
            }
        } else {
            // SLOW PATH: shared or empty — allocate new
            let new_cap = next_capacity(0, new_len as usize);
            let new_data = ori_rc_alloc(new_cap * elem_size, elem_align);
            if !list.data.is_null() {
                // Copy old elements
                unsafe {
                    std::ptr::copy_nonoverlapping(
                        list.data as *const u8,
                        new_data,
                        (list.len as usize) * elem_size,
                    );
                }
                // Decrement old buffer (we no longer reference it)
                // NOTE: Only dec the buffer RC, NOT the elements — they're shared
                ori_rc_dec_no_children(list.data as *mut u8, ...);
            }
            unsafe {
                let dst = new_data.add((list.len as usize) * elem_size);
                std::ptr::copy_nonoverlapping(elem, dst, elem_size);
            }
            return OriList { len: new_len, cap: new_cap as i64, data: new_data };
        }
    }
  • Slow path element RC handling: When copying elements from a shared list to a new buffer, each element that is itself RC’d must be incremented (the new buffer now also references them). The codegen (§02.7) must emit element-wise ori_rc_inc calls after ori_list_push on the slow path. (2026-02-26, design documented; runtime uses byte-copy; codegen-side RC inc deferred to §02.7)

    Design decision — who increments element RCs:

    (a) Runtime function increments elements (recommended):

    • Pro: Single call from codegen, simpler emission
    • Pro: Runtime knows element layout via elem_size/elem_align
    • Con: Runtime needs element RC info (a function pointer or strategy enum)

    (b) Codegen increments elements:

    • Pro: Codegen already knows element types
    • Con: More complex emission, larger generated code
    • Con: Duplicated logic across operations

    Recommended: Option (a) — pass a drop_fn-style callback for element RC operations. The runtime calls it for each copied element on the slow path. This matches the existing drop function pattern in arc_emitter/drop_gen.rs.

  • Thread safety note: The uniqueness check uses Relaxed ordering. This is safe because: (2026-02-26, documented in ori_rc_is_unique doc comment — §01.1)

    • If RC==1, no other thread holds a reference (by definition)
    • If another thread is racing with ori_rc_inc, the value is being shared, and Relaxed may see either 1 or 2 — if it sees 1, the inc hasn’t happened yet, so we’re still the sole owner; if it sees 2, we correctly take the slow path
    • Release/Acquire is only needed for the dec path (to ensure visibility of writes before deallocation)
  • Unit tests (Rust): (2026-02-27, 5 tests in ori_rt/src/tests.rs: cow_push_to_empty_sentinel, cow_push_unique_with_capacity, cow_push_unique_needs_growth, cow_push_shared_list_copies, cow_push_1000_sequential_amortized. All pass with zero leaks verified via RC_LIVE_COUNT.)

    • Push to empty list (sentinel) → allocates, len=1, cap=MIN_CAPACITY
    • Push to unique list with capacity → in-place, same data pointer
    • Push to unique list without capacity → realloc, new data pointer, doubled cap
    • Push to shared list (RC=2) → new allocation, old untouched
    • 1000 sequential pushes → amortized O(1), ~10 reallocations
  • AOT integration test (compiler/ori_llvm/tests/aot/): (2026-02-27, test_coll_list_push and push_empty/push_multi tests in collections_ext.rs)

    @test tests {
        let list = [1, 2, 3]
        let list = list.push(4)
        assert_eq(list, [1, 2, 3, 4])
        assert_eq(list.length(), 4)
    }

02.2 COW Pop

File(s): compiler/ori_rt/src/lib.rs

Pop removes and returns the last element. With COW, if unique, we simply decrement len (the element is still in the buffer but inaccessible). If shared, we copy all-but-last.

  • Replace ori_list_pop_new with COW-aware ori_list_pop: (2026-02-27, implemented as ori_list_pop_cow with consuming semantics. Fast path: unique=decrement len O(1), capacity retained. Slow path: shared=allocate+copy len-1+dec old. Empty list returns sentinel.)

    /// Removes the last element from the list. Returns the shortened list.
    /// If unique: decrements len (O(1), element stays in buffer but is logically removed).
    /// If shared: allocates new buffer with len-1 elements, copies, decs old.
    ///
    /// The popped element must be extracted by the caller BEFORE calling pop
    /// (via list.last() or index access). Pop only shortens the list.
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_pop(
        list: OriList,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList {
        assert!(list.len > 0, "pop on empty list");
        let new_len = list.len - 1;
    
        if ori_rc_is_unique(list.data as *const u8) {
            // FAST PATH: just shrink len
            // NOTE: We do NOT free the popped element here — the caller
            // extracted it and owns it. We just shrink the logical size.
            OriList { len: new_len, cap: list.cap, data: list.data }
        } else {
            // SLOW PATH: allocate new, copy len-1 elements
            // ... (similar to push slow path but copies fewer elements)
        }
    }
  • Capacity reclamation: When a unique list’s len drops below cap / 4, consider shrinking. But this is a tradeoff — shrinking prevents memory waste but causes reallocation. Decision: Do NOT auto-shrink. Let capacity grow but never auto-shrink. Users can explicitly call list.compact() (future method) to reclaim. This matches Rust’s Vec behavior. (2026-02-27, implemented — fast path retains capacity, verified by cow_pop_to_empty_retains_buffer test)

  • Unit tests: (2026-02-27, 4 tests in ori_rt/src/tests.rs: cow_pop_unique_decrements_len, cow_pop_shared_copies, cow_pop_to_empty_retains_buffer, cow_pop_empty_list_returns_empty. All pass, zero leaks.)

    • Pop from unique list → same data pointer, len decremented
    • Pop from shared list → new allocation, old untouched
    • Pop to empty → len=0, data pointer still valid (capacity retained)

02.3 COW Set (Index Assignment)

File(s): compiler/ori_rt/src/lib.rs

Set replaces the element at a given index. With COW, if unique, we overwrite in place. If shared, we copy the whole list, then overwrite.

  • Replace ori_list_set_new with COW-aware ori_list_set: (2026-02-27, implemented as ori_list_set_cow with consuming semantics. Fast path: unique=overwrite in-place O(1). Slow path: shared=copy entire list+overwrite in copy O(n). Out-of-bounds returns input unchanged. Codegen wiring deferred to §02.7)

    /// Replaces the element at `index` with `elem`.
    /// If unique: overwrites in place (O(1)).
    /// If shared: copies list, overwrites in copy (O(n)).
    ///
    /// Element RC: The OLD element at `index` must be decremented by the caller
    /// (codegen emits this). The new element is moved in (no inc needed).
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_set(
        list: OriList,
        index: i64,
        elem: *const u8,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList {
        assert!(index >= 0 && index < list.len, "index out of bounds");
    
        if ori_rc_is_unique(list.data as *const u8) {
            // FAST PATH: overwrite in place
            unsafe {
                let dst = list.data.add((index as usize) * elem_size);
                std::ptr::copy_nonoverlapping(elem, dst, elem_size);
            }
            list // Return same list (data pointer unchanged)
        } else {
            // SLOW PATH: clone, then overwrite
            let new_data = ori_rc_alloc((list.cap as usize) * elem_size, elem_align);
            unsafe {
                std::ptr::copy_nonoverlapping(
                    list.data as *const u8, new_data,
                    (list.len as usize) * elem_size,
                );
                let dst = new_data.add((index as usize) * elem_size);
                std::ptr::copy_nonoverlapping(elem, dst, elem_size);
            }
            // inc all elements (they're now referenced by new buffer too)
            // dec old buffer
            OriList { len: list.len, cap: list.cap, data: new_data }
        }
    }
  • Unit tests: (2026-02-27, 3 tests in ori_rt/src/tests.rs: cow_set_unique_overwrites_in_place, cow_set_shared_copies, cow_set_at_index_zero. All pass, zero leaks.)

    • Set on unique list → same data pointer, element replaced
    • Set on shared list → new allocation, old list unchanged
    • Set at index 0, middle, and last position

02.4 COW Insert & Remove

File(s): compiler/ori_rt/src/lib.rs

Insert shifts elements right; remove shifts elements left. With COW, if unique, we shift in place using memmove. If shared, we copy with the shift built into the copy.

  • Add ori_list_insert: (2026-02-27, implemented as ori_list_insert_cow with consuming semantics. Fast path: unique+capacity=memmove right+write O(n). unique+growth=realloc+memmove+write. Slow path: shared=alloc+copy [0..idx]+elem+copy [idx..len]+dec old. Index 0..=len valid, OOB returns unchanged.)

    /// Inserts `elem` at `index`, shifting subsequent elements right.
    /// If unique and has capacity: memmove right + write (O(n) for shift, but
    /// no allocation — the n elements are already in cache).
    /// If shared: allocate new, copy [0..index], write elem, copy [index..len].
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_insert(
        list: OriList,
        index: i64,
        elem: *const u8,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList { ... }
  • Add ori_list_remove: (2026-02-27, implemented as ori_list_remove_cow with consuming semantics. Fast path: unique=memmove left O(n), unique+empty=ori_rc_free. Slow path: shared=alloc+copy [0..idx]+[idx+1..len]+dec old. Caller extracts element before call.)

    /// Removes element at `index`, shifting subsequent elements left.
    /// If unique: memmove left (O(n) for shift, no allocation).
    /// If shared: allocate new, copy [0..index] + [index+1..len].
    /// The removed element must be extracted by caller before this call.
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_remove(
        list: OriList,
        index: i64,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList { ... }
  • Unit tests: (2026-02-27, 11 tests: 6 insert (beginning/middle/end unique, growth, shared, empty) + 5 remove (beginning/middle/end unique, last-element-frees, shared). All pass, zero leaks.)

    • Insert at beginning, middle, end (unique and shared)
    • Remove from beginning, middle, end (unique and shared)
    • Insert that triggers growth (unique: realloc + shift, shared: fresh alloc)
    • Insert/remove on single-element list

02.5 COW Concat (List Concatenation)

File(s): compiler/ori_rt/src/lib.rs

Concat (list1 + list2) appends all elements of list2 to list1. With COW, if list1 is unique and has capacity for list2, we extend in place.

  • Replace ori_list_concat_new with COW-aware ori_list_concat: (2026-02-27, implemented as ori_list_concat_cow with consuming semantics for list1, borrowing list2. Fast path: list1 unique+capacity=memcpy list2 O(m). unique+growth=realloc+memcpy. Slow path: shared=alloc+copy both+dec old list1.)

    /// Concatenates list2 onto list1.
    /// If list1 is unique:
    ///   - Has capacity: memcpy list2 elements into list1's buffer (O(m))
    ///   - No capacity: realloc list1 to fit, then memcpy (O(n+m))
    /// If list1 is shared:
    ///   - Allocate new buffer for n+m elements, copy both (O(n+m))
    ///
    /// In all cases, elements of list2 need RC inc (they're now in list1 too).
    /// list2's buffer RC is NOT decremented — the caller manages list2's lifetime.
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_concat(
        list1: OriList,
        list2: OriList,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList { ... }
  • Optimization: consume list2 when unique: If list2 is also uniquely owned and has no remaining references, we can move its elements without incrementing their RCs. This requires passing a flag or checking list2’s uniqueness too. (2026-02-28, already implemented: ori_list_concat_cow checks data2_unique = ori_rc_is_unique(data2), copy_list2_elements skips inc_fn when list2_unique=true, dec_consumed_list2 frees buffer directly. 11 unit tests pass including cow_concat_both_unique_no_inc verifying zero inc_fn calls.)

    Decision: Check both lists’ uniqueness. Four cases:

    1. Both unique: move list2’s elements (no element RC changes), realloc list1 if needed
    2. list1 unique, list2 shared: copy list2’s elements (inc each), extend list1
    3. list1 shared, list2 unique: allocate new, copy list1 (inc each), move list2
    4. Both shared: allocate new, copy both (inc all elements)
  • Unit tests: (2026-02-27, 5 tests: cow_concat_unique_with_capacity, cow_concat_unique_needs_growth, cow_concat_shared_copies, cow_concat_empty_lists, cow_concat_empty_list1. All pass, zero leaks.)

    • Concat where list1 has capacity → no realloc
    • Concat where list1 needs growth → realloc
    • Concat with shared list1 → new allocation
    • Concat empty lists (sentinels)
    • Concat large lists (1000+ elements)
    • Concat list with itself (list + list)

02.6 COW Reverse & Sort

File(s): compiler/ori_rt/src/lib.rs

Reverse and sort are in-place algorithms when the list is unique.

  • Add COW ori_list_reverse: (2026-02-27, implemented as ori_list_reverse_cow with consuming semantics. Fast path: unique=swap loop in-place O(n). Slow path: shared=alloc+copy in reverse order+dec old. Single/empty elements returned unchanged.)

    /// Reverses the list.
    /// If unique: reverse in place using swap loop (O(n), no allocation).
    /// If shared: allocate new, copy in reverse order (O(n)).
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_reverse(
        list: OriList,
        elem_size: usize,
        elem_align: usize,
    ) -> OriList { ... }
  • Add COW ori_list_sort: (2026-02-27, implemented as ori_list_sort_cow with consuming semantics. Uses index-sort + cycle-following permutation. Fast path: unique=sort indices+permute in-place O(n log n). Slow path: shared=sort indices+copy in sorted order+dec old. compare_fn has C ABI.)

    /// Sorts the list using the provided comparison function.
    /// If unique: sort in place (O(n log n), no allocation).
    /// If shared: copy, then sort the copy (O(n) copy + O(n log n) sort).
    ///
    /// The comparison function has signature: (a: *const u8, b: *const u8) -> i32
    /// returning negative (a < b), zero (a == b), positive (a > b).
    #[unsafe(no_mangle)]
    pub extern "C" fn ori_list_sort(
        list: OriList,
        elem_size: usize,
        elem_align: usize,
        compare_fn: extern "C" fn(*const u8, *const u8) -> i32,
    ) -> OriList { ... }
  • Sort algorithm choice: Uses Rust’s Vec::sort_unstable_by (pattern-defeating quicksort) on an index array, then applies the permutation. Unstable sort by default — no allocation beyond the index and visited arrays. (2026-02-27)

  • Stable sort: Add sort_stable method using Rust’s Vec::sort_by (TimSort). Same COW pattern as unstable sort, but preserves equal-element order. (2026-02-28, extracted shared list_sort_cow_impl helper with stable: bool flag. Wired through all 8 pipeline locations: TYPECK, resolve_list_method, MethodNames, evaluator dispatch, EVAL_BUILTIN_METHODS, ARC CONSUMING_RECEIVER_METHOD_NAMES, LLVM dispatch+emitter, runtime_decl+AOT_ONLY list. Tests: 3 runtime unit tests (stability, all-equal, shared COW), 2 AOT integration, 6 Ori spec tests.)

  • Unit tests: (2026-02-27, 12 tests: 5 reverse (unique even/odd, shared, single, empty) + 7 sort (unique, shared, already-sorted, reverse-sorted, duplicates, single, empty). All pass, zero leaks.)

    • Reverse unique list → same data pointer, elements reversed
    • Reverse shared list → new allocation, original unchanged
    • Sort unique list → same data pointer, elements sorted
    • Sort shared list → new allocation
    • Sort already-sorted list (best case)
    • Sort reverse-sorted list (worst case for naive quicksort)
    • Sort with duplicates
    • Sort empty and single-element lists

02.7 LLVM Codegen Updates

File(s): compiler/ori_llvm/src/codegen/arc_emitter/builtins/collections.rs, compiler/ori_llvm/src/codegen/runtime_decl/mod.rs

All existing list method emitters must be updated to call the new COW runtime functions instead of the old _new variants.

  • Update emit_list_push to call ori_list_push_cow instead of ori_list_push_new: (2026-02-27, passes elem_size + elem_align, extracts cap field for COW fast path)

    • Pass elem_size and elem_align as additional arguments
    • Handle element RC on the slow path (emit inc loop for RC-typed elements)
  • Update emit_list_pop → read-only last(): (2026-02-27) pop() in Ori’s value semantics acts as a read-only accessor returning Option<T> (same as last()). LLVM codegen routes pop to emit_list_last() with borrow: true. Evaluator wired: pop dispatches as alias for last in dispatch_list_method. COW ori_list_pop_cow preserved with #[expect(dead_code)] for future mutation syntax. Spec tests, AOT tests, Rosetta stack test updated.

  • Update emit_list_setori_list_set_cow: (2026-02-27, added TYPECK entry, wired dispatch, evaluator implementation with bounds checking)

    • Emit ori_rc_dec for the OLD element before the set call
    • The new element is moved in (no inc)
  • Update emit_list_concatori_list_concat_cow: (2026-02-27, passes cap1, uses elem_align; also updated + operator in operators.rs)

  • Wire emitters for insert, remove, sort: (2026-02-27, all wired: insert/remove/sort dispatch in mod.rs, TYPECK entries added, evaluator sort uses compare_values with error propagation, LLVM sort generates per-type compare thunks via get_or_create_compare_thunk — supports int/float/bool/char/byte/str)

  • Element RC in slow path: (2026-02-27) Added inc_fn: Option<extern "C" fn(*mut u8)> callback to all 8 COW functions in ori_rt. Runtime calls inc_fn on each byte-copied element during slow path. Codegen generates per-type elem_inc_fn (mirrors elem_dec_fn pattern) via get_or_generate_elem_inc_fn() in ArcIrEmitter. Concat special-cases: inc list2 elements on ALL paths (always byte-copied), inc list1 elements only on slow path. Set/insert skip the overwritten/inserted index. Scalar types pass null (no-op).

  • Update runtime_decl/tests.rs to verify new function signatures: (2026-02-27, all COW functions were already declared in RT_FUNCTIONS from Section 01; declared_functions_covered_by_jit_or_aot_only test passes)


02.8 Completion Checklist

  • ori_list_push — unique list with capacity: same pointer, O(1) (2026-02-27, verified: cow_push_unique_with_capacity asserts same data pointer)
  • ori_list_push — unique list without capacity: realloc, doubled cap (2026-02-27, verified: cow_push_unique_needs_growth asserts cap >= 8)
  • ori_list_push — shared list: new allocation, old untouched (2026-02-27, verified: cow_push_shared_list_copies asserts different pointer + old data intact)
  • ori_list_push — empty (sentinel): allocates MIN_CAPACITY (2026-02-27, verified: cow_push_to_empty_sentinel asserts cap >= 4)
  • ori_list_pop — unique: same pointer, len decremented (2026-02-27, verified: cow_pop_unique_decrements_len asserts same pointer + len=2)
  • ori_list_pop — shared: new allocation (2026-02-27, verified: cow_pop_shared_copies asserts different pointer)
  • ori_list_set — unique: in-place overwrite (2026-02-27, verified: cow_set_unique_overwrites_in_place asserts same pointer)
  • ori_list_set — shared: copy + overwrite (2026-02-27, verified: cow_set_shared_copies asserts different pointer)
  • ori_list_insert — unique with capacity: memmove + write (2026-02-27, verified: cow_insert_unique_at_beginning/middle/end assert same pointer)
  • ori_list_insert — shared: new allocation with element inserted (2026-02-27, verified: cow_insert_shared_copies asserts different pointer)
  • ori_list_remove — unique: memmove left (2026-02-27, verified: cow_remove_unique_at_beginning/middle/end assert same pointer)
  • ori_list_remove — shared: new allocation without element (2026-02-27, verified: cow_remove_shared_copies asserts different pointer)
  • ori_list_concat — unique with capacity: memcpy append (2026-02-27, verified: cow_concat_unique_with_capacity asserts same pointer)
  • ori_list_concat — shared: new allocation with both lists (2026-02-27, verified: cow_concat_shared_list1_unique_list2 asserts different pointer)
  • ori_list_reverse — unique: in-place reverse (2026-02-27, verified: cow_reverse_unique_in_place asserts same pointer + reversed elements)
  • ori_list_sort — unique: in-place sort (2026-02-27, verified: cow_sort_unique_in_place asserts same pointer + sorted elements)
  • All LLVM codegen emitters updated to use COW functions (2026-02-27)
  • Element RC handled correctly on slow path (inc on copy, dec on replace) (2026-02-27)
  • 1000-push benchmark: O(N) total, not O(N²) (2026-02-28, test_coll_list_push_loop_1000 AOT test passes — 1000 pushes with correct first/last values)
  • Valgrind clean: no leaks, no use-after-free, no double-free (2026-02-28, all 7 COW paths clean under Valgrind — push, insert, remove, sort, reverse, concat, chained push+concat)
  • ./test-all.sh green (2026-02-28, 10429 passed / 0 failed — includes ARC consuming fixes)
  • ./clippy-all.sh green (2026-02-27)
  • AOT integration tests for all operations (2026-02-28, 18 new tests in collections_ext.rs: set, insert, remove, sort, push_loop_1000, 5 COW sharing tests, chained push_concat/concat_reverse)

Exit Criteria: A program that pushes 10,000 elements to a list completes in O(N) time (measurable via benchmark). Sharing a list and then mutating the copy triggers exactly one O(N) copy, after which subsequent mutations are O(1). Valgrind reports zero errors on all list COW test programs. ./test-all.sh and ./llvm-test.sh both pass.