0%

Section 01: Runtime Function Effect Summaries

Status: Not Started Goal: Eliminate false positive ARC violations caused by runtime functions that allocate or manage RC objects internally. After this section, J9’s arc_violations should drop from 9 to ≤2, and J10’s from 15 to ≤5.

Context: The arc_metrics.py scanner counts ori_rc_inc and ori_rc_dec calls per function and checks if they balance. But ori_str_from_raw allocates a string with RC=1 internally — the scanner sees 0 incs and 3 decs, flags it as unbalanced. The Clang Static Analyzer solved this exact problem with RetainSummaryManager: a table mapping function names to their RC effects on parameters and return values.

Reference implementations:

  • Clang clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountChecker.cpp: RetainSummaryManager maps CoreFoundation/Cocoa functions to RetEffect (returns +1, +0, or owned). This is exactly the pattern we need.
  • Swift include/swift/SILOptimizer/Analysis/RCIdentityAnalysis.h: Traces RC identity through projections (struct_extract, enum_data). Relevant for understanding when a dec on a field equals a dec on the parent.

Depends on: Nothing (foundation section).


01.1 Effect Summary Table

File(s): .claude/skills/code-journey/effect_summaries.py (new)

WARNING (scalability): The RUNTIME_EFFECTS dict currently has ~45 entries (~300 lines of code). Each new pub extern "C" fn ori_* in ori_rt requires a manual entry. The sync test (Section 01.4) will catch missing entries, but if the table grows past ~60 entries, consider auto-generating it from ori_rt function signatures with an #[rc_effect(...)] attribute or a build script.

The effect summary table maps runtime function names to their RC effects. Each entry specifies:

  • Return effect: Does the return value carry a +1 RC? (e.g., ori_str_from_raw returns an owned value with RC=1)
  • Parameter effects: Does any parameter get consumed (-1)? (e.g., ori_rc_dec consumes arg 0)

Terminology: PLUS_ONE (+1) = allocates or returns owned, MINUS_ONE (-1) = consumes or decrements, BORROWED = no RC change but pointer must remain valid, NONE = no RC effect at all. These map to Clang’s RetEffect concepts: +1 retained, -1 consumed, +0 not-owned.

This is the Python equivalent of Clang’s RetainSummaryManager, but much simpler — we have ~137 runtime functions (of which ~40-50 have RC effects), not thousands of Cocoa APIs. Functions with no RC effects (queries, comparisons, printing) are implicit NONE/NONE and don’t need entries — get_effect() returns None for them, which callers treat as “no RC impact.”

  • Create effect_summaries.py with the effect table:

    from dataclasses import dataclass
    from enum import Enum
    
    class RcEffect(Enum):
        """RC effect of a function on a value."""
        NONE = "none"           # No RC effect
        PLUS_ONE = "+1"         # Allocates / returns owned (+1 RC)
        MINUS_ONE = "-1"        # Consumes / decrements (-1 RC)
        BORROWED = "borrowed"   # Borrows (no RC change, must not be freed)
    
    @dataclass
    class FunctionEffect:
        """RC effects of a runtime function."""
        return_effect: RcEffect       # Effect on the return value
        param_effects: list[RcEffect] # Effect on each parameter (by index)
        is_allocation: bool = False   # True if this function creates a new RC object
    
    # Verified against compiler/ori_rt/src/ and
    # compiler/ori_llvm/src/codegen/runtime_decl/runtime_functions.rs
    #
    # Note: Struct-returning functions (ori_str_from_raw, ori_str_concat)
    # use sret convention in LLVM IR, so the +1 RC is on the embedded data
    # pointer, not the return value itself.
    RUNTIME_EFFECTS: dict[str, FunctionEffect] = {
        # --- Allocation (return +1) ---
        "ori_rc_alloc": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.NONE],  # size, align
            is_allocation=True,
        ),
        # Note: ori_str_from_raw returns OriStr struct {len, cap, data} via sret.
        # The +1 RC is on the embedded data pointer, not the struct itself.
        # SSO strings (<=23 bytes) have no heap allocation (data ptr is null).
        "ori_str_from_raw": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,  # data ptr inside returned struct
            param_effects=[RcEffect.BORROWED, RcEffect.NONE],  # src_ptr, len
            is_allocation=True,
        ),
        # Note: ori_str_concat also returns OriStr struct via sret.
        # Params are *const OriStr (pointers to struct, not to data).
        "ori_str_concat": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,  # data ptr inside returned struct
            param_effects=[RcEffect.BORROWED, RcEffect.BORROWED],  # a, b
            is_allocation=True,
        ),
        "ori_list_alloc_data": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.NONE],  # capacity, elem_size
            is_allocation=True,
        ),
        # Note: ori_map_alloc does NOT exist. Map allocation uses
        # ori_map_literal_alloc(count, key_size, val_size, out_cap) -> ptr.
        "ori_map_literal_alloc": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE] * 4,  # count, key_size, val_size, out_cap
            is_allocation=True,
        ),
    
        # --- Deallocation (param -1) ---
        "ori_rc_dec": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE],  # data_ptr, drop_fn
        ),
        "ori_rc_free": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE, RcEffect.NONE],  # data_ptr, size, align
        ),
        # (data, len, cap, elem_size, elem_dec_fn) — 5 params
        "ori_buffer_rc_dec": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 4,
        ),
        "ori_buffer_drop_unique": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 4,
        ),
        # (data, cap, len, key_size, val_size, key_dec_fn, val_dec_fn) — 7 params
        "ori_map_buffer_rc_dec": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 6,
        ),
        "ori_map_buffer_drop_unique": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 6,
        ),
    
        # --- Increment (+1 on param, no return effect) ---
        "ori_rc_inc": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.PLUS_ONE],  # data_ptr — increments RC
        ),
        "ori_list_rc_inc": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.PLUS_ONE, RcEffect.NONE],  # data_ptr, cap
        ),
    
        # --- COW (may consume input, returns new or same) ---
        # COW functions follow ori_*_cow naming
        # They borrow the input and may return a copy (+1) or the same pointer
        # For balance purposes: input is borrowed, output is +1
    
        # --- String methods that return new OriStr (possible +1) ---
        # Each returns OriStr struct; +1 on embedded data ptr if non-SSO.
        "ori_str_from_int": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE],  # n: i64
            is_allocation=True,
        ),
        "ori_str_from_bool": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE],  # b: bool
            is_allocation=True,
        ),
        "ori_str_from_float": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE],  # f: f64
            is_allocation=True,
        ),
        "ori_str_substring": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED, RcEffect.NONE, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_str_trim": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED],
            is_allocation=True,
        ),
        "ori_str_to_uppercase": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED],
            is_allocation=True,
        ),
        "ori_str_to_lowercase": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED],
            is_allocation=True,
        ),
        "ori_str_replace": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED, RcEffect.BORROWED, RcEffect.BORROWED],
            is_allocation=True,
        ),
        "ori_str_repeat": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_str_push_char": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
    
        # --- Format functions (return OriStr, possible +1) ---
        "ori_format_int": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_format_float": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_format_str": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_format_bool": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_format_char": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
    
        # --- Set allocation ---
        "ori_set_literal_alloc": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE, RcEffect.NONE, RcEffect.NONE],  # count, elem_size, out_cap
            is_allocation=True,
        ),
        "ori_set_empty": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[],
            is_allocation=True,
        ),
    
        # --- List allocation (additional) ---
        "ori_list_empty": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[],
            is_allocation=True,
        ),
    
        # --- Set RC operations ---
        "ori_set_buffer_rc_dec": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 5,
        ),
        "ori_set_buffer_drop_unique": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 5,
        ),
    
        # --- Iterator sources (allocate heap-backed iterator state) ---
        "ori_iter_from_list": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED] * 3,  # data, len, elem_size
            is_allocation=True,
        ),
        "ori_iter_from_range": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.NONE] * 4,  # start, end, step, inclusive
            is_allocation=True,
        ),
        "ori_iter_from_str": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED],
            is_allocation=True,
        ),
        "ori_iter_from_map": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.BORROWED] * 4,
            is_allocation=True,
        ),
    
        # --- Iterator adapters (consume input iterator, return new) ---
        "ori_iter_map": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_iter_filter": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.BORROWED, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_iter_take": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_iter_skip": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_iter_enumerate": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE],
            is_allocation=True,
        ),
        "ori_iter_zip": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.MINUS_ONE, RcEffect.NONE],
            is_allocation=True,
        ),
        "ori_iter_chain": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[RcEffect.MINUS_ONE, RcEffect.MINUS_ONE],
            is_allocation=True,
        ),
    
        # --- Iterator consumers (consume input iterator, no new allocation) ---
        "ori_iter_drop": FunctionEffect(
            return_effect=RcEffect.NONE,
            param_effects=[RcEffect.MINUS_ONE],
        ),
    
        # --- Catch/recover (returns OriStr) ---
        "ori_catch_recover": FunctionEffect(
            return_effect=RcEffect.PLUS_ONE,
            param_effects=[],
            is_allocation=True,
        ),
    }
    
    def get_effect(func_name: str) -> FunctionEffect | None:
        """Look up the RC effect of a runtime function.
    
        Returns None for unknown functions (user-defined or unrecognized runtime).
        COW functions (ori_*_cow) are handled by pattern matching.
        """
        if func_name in RUNTIME_EFFECTS:
            return RUNTIME_EFFECTS[func_name]
        # COW pattern: ori_{type}_{op}_cow
        if func_name.startswith("ori_") and func_name.endswith("_cow"):
            return FunctionEffect(
                return_effect=RcEffect.PLUS_ONE,  # COW returns owned
                param_effects=[RcEffect.BORROWED],  # input is borrowed
            )
        return None
    
    def is_allocation_function(func_name: str) -> bool:
        """Check if a function is known to allocate (return +1)."""
        effect = get_effect(func_name)
        return effect is not None and effect.is_allocation
  • Verify all entries against compiler/ori_rt/src/ (read the actual Rust function signatures)

  • Verify all entries against compiler/ori_llvm/src/codegen/runtime_decl/runtime_functions.rs

  • Add tests in .claude/skills/code-journey/tests/test_effect_summaries.py

Also missing from _RC_DEC_RE in arc_metrics.py (these are included in the effect table above but the current _RC_DEC_RE regex does not match them):

  • Add ori_set_buffer_rc_dec to _RC_DEC_RE — set buffer decrement
  • Add ori_set_buffer_drop_unique to _RC_DEC_RE — set buffer unique drop

Verification checklist for effect table completeness:

  • All iterator sources (ori_iter_from_list, ori_iter_from_range, etc.) present with +1
  • All iterator adapters (ori_iter_map, ori_iter_filter, etc.) present with -1 input / +1 output
  • ori_iter_drop present with -1
  • All string methods returning OriStr (ori_str_substring, ori_str_trim, etc.) present with +1
  • All format functions (ori_format_int, ori_format_float, etc.) present with +1
  • ori_set_literal_alloc, ori_set_empty, ori_list_empty present with +1

01.2 Integration into arc_metrics.py

File(s): .claude/skills/code-journey/arc_metrics.py

Update the ARC metrics extractor to use effect summaries for balance checking.

  • Fix existing _RC_DEC_RE in arc_metrics.py to include ori_map_buffer_rc_dec (currently missing from the dec pattern, which undercounts map buffer decrements)
  • Import effect_summaries and get_effect into arc_metrics.py
  • In _count_rc_ops(), also count allocation functions as implicit +1:
    def _count_rc_ops(func: Function) -> tuple[int, int]:
        """Count RC inc and dec operations, including implicit allocations."""
        inc = 0
        dec = 0
        for block in func.blocks:
            for instr in block.instructions:
                # Explicit RC inc
                if _RC_INC_RE.search(instr.text):
                    inc += 1
                # Explicit RC dec
                if _RC_DEC_RE.search(instr.text):
                    dec += 1
                # Implicit allocation (+1) from runtime functions
                callee = _extract_callee(instr.text)
                if callee:
                    effect = get_effect(callee)
                    if effect and effect.return_effect == RcEffect.PLUS_ONE:
                        inc += 1  # Counts as an implicit RC inc
  • Add _extract_callee() helper to pull function name from call/invoke instructions (must handle both call and invoke, quoted names @"...", and sret calling convention where the first argument is a return pointer):
    _CALLEE_RE = re.compile(r'(?:call|invoke)\b[^@]*@(?:"([^"]+)"|(\S+?))\s*\(')
    
    def _extract_callee(text: str) -> str | None:
        """Extract the callee function name from a call or invoke instruction."""
        m = _CALLEE_RE.search(text)
        if not m:
            return None
        return m.group(1) or m.group(2)
  • Update tests: J9 should now show balanced RC (the ori_str_from_raw +1 balances the 3 ori_rc_dec calls minus the 2 other allocations)
  • Preserve backward compatibility: effect summaries ADD information, don’t change existing detection

01.3 Integration into rc_balance.rs (Rust verifier)

File(s): compiler/ori_llvm/src/verify/rc_balance.rs

The in-pipeline Rust verifier has the same blind spot. Update it to recognize allocation functions beyond just ori_rc_alloc.

  • Add the same summary table concept to rc_balance.rs:

    /// Runtime functions that allocate RC-managed memory.
    ///
    /// `ori_rc_alloc` and `ori_list_alloc_data` return a `ptr` directly.
    /// `ori_map_literal_alloc` returns a `ptr` to the hash table buffer.
    /// `ori_str_from_raw` and `ori_str_concat` return `OriStr` structs via
    /// sret — the RC-managed data pointer is embedded inside the struct.
    const RC_ALLOCATION_FUNCTIONS: &[&str] = &[
        "ori_rc_alloc",
        "ori_str_from_raw",
        "ori_str_concat",
        "ori_list_alloc_data",
        "ori_map_literal_alloc",
    ];
  • Update process_call() match to check RC_ALLOCATION_FUNCTIONS instead of just "ori_rc_alloc"

  • Note: Only add direct-ptr-returning functions (ori_list_alloc_data, ori_map_literal_alloc, ori_set_literal_alloc) to RC_ALLOCATION_FUNCTIONS in rc_balance.rs. Sret-returning functions (ori_str_from_raw, ori_str_concat, ori_format_*) write their return value through an sret pointer parameter (not the call result), requiring different tracking — defer to a follow-up.

  • Add test in compiler/ori_llvm/src/verify/tests.rs for the new allocation recognition

  • Run cargo test -p ori_llvm -- verify to validate


01.4 Completion Checklist

  • effect_summaries.py exists with all runtime functions mapped
  • All entries verified against ori_rt source and runtime_functions.rs
  • arc_metrics.py uses effect summaries for balance checking
  • rc_balance.rs recognizes all allocation functions
  • J9 (strings): arc_violations ≤ 2 (down from 9)
  • J10 (lists): arc_violations ≤ 5 (down from 15)
  • All existing tests pass: python3 -m pytest tests/ and cargo test -p ori_llvm
  • No regressions on J1-J4, J6-J8, J11-J12

Sync test requirement:

  • Add a test that greps compiler/ori_rt/src/ for all pub extern "C" fn ori_* signatures and verifies each one either:
    1. Has an entry in RUNTIME_EFFECTS, OR
    2. Matches the COW wildcard pattern (ori_*_cow), OR
    3. Is explicitly listed in a NO_RC_EFFECT allowlist (e.g., ori_str_eq, ori_str_len, ori_print — functions with no RC effects) This prevents drift when new runtime functions are added.

Exit Criteria: Running extract-metrics.py on J9’s IR produces arc_violations ≤ 2 and arc_has_unbalanced: false for string construction/destruction functions. The remaining violations (if any) are genuine codegen issues, not scanner blind spots.