Section 01: Runtime Function Effect Summaries
Status: Not Started
Goal: Eliminate false positive ARC violations caused by runtime functions that allocate or manage RC objects internally. After this section, J9’s arc_violations should drop from 9 to ≤2, and J10’s from 15 to ≤5.
Context: The arc_metrics.py scanner counts ori_rc_inc and ori_rc_dec calls per function and checks if they balance. But ori_str_from_raw allocates a string with RC=1 internally — the scanner sees 0 incs and 3 decs, flags it as unbalanced. The Clang Static Analyzer solved this exact problem with RetainSummaryManager: a table mapping function names to their RC effects on parameters and return values.
Reference implementations:
- Clang
clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountChecker.cpp:RetainSummaryManagermaps CoreFoundation/Cocoa functions toRetEffect(returns +1, +0, or owned). This is exactly the pattern we need. - Swift
include/swift/SILOptimizer/Analysis/RCIdentityAnalysis.h: Traces RC identity through projections (struct_extract,enum_data). Relevant for understanding when a dec on a field equals a dec on the parent.
Depends on: Nothing (foundation section).
01.1 Effect Summary Table
File(s): .claude/skills/code-journey/effect_summaries.py (new)
WARNING (scalability): The
RUNTIME_EFFECTSdict currently has ~45 entries (~300 lines of code). Each newpub extern "C" fn ori_*inori_rtrequires a manual entry. The sync test (Section 01.4) will catch missing entries, but if the table grows past ~60 entries, consider auto-generating it fromori_rtfunction signatures with an#[rc_effect(...)]attribute or a build script.
The effect summary table maps runtime function names to their RC effects. Each entry specifies:
- Return effect: Does the return value carry a +1 RC? (e.g.,
ori_str_from_rawreturns an owned value with RC=1) - Parameter effects: Does any parameter get consumed (-1)? (e.g.,
ori_rc_decconsumes arg 0)
Terminology: PLUS_ONE (+1) = allocates or returns owned, MINUS_ONE (-1) = consumes or decrements, BORROWED = no RC change but pointer must remain valid, NONE = no RC effect at all. These map to Clang’s RetEffect concepts: +1 retained, -1 consumed, +0 not-owned.
This is the Python equivalent of Clang’s RetainSummaryManager, but much simpler — we have ~137 runtime functions (of which ~40-50 have RC effects), not thousands of Cocoa APIs. Functions with no RC effects (queries, comparisons, printing) are implicit NONE/NONE and don’t need entries — get_effect() returns None for them, which callers treat as “no RC impact.”
-
Create
effect_summaries.pywith the effect table:from dataclasses import dataclass from enum import Enum class RcEffect(Enum): """RC effect of a function on a value.""" NONE = "none" # No RC effect PLUS_ONE = "+1" # Allocates / returns owned (+1 RC) MINUS_ONE = "-1" # Consumes / decrements (-1 RC) BORROWED = "borrowed" # Borrows (no RC change, must not be freed) @dataclass class FunctionEffect: """RC effects of a runtime function.""" return_effect: RcEffect # Effect on the return value param_effects: list[RcEffect] # Effect on each parameter (by index) is_allocation: bool = False # True if this function creates a new RC object # Verified against compiler/ori_rt/src/ and # compiler/ori_llvm/src/codegen/runtime_decl/runtime_functions.rs # # Note: Struct-returning functions (ori_str_from_raw, ori_str_concat) # use sret convention in LLVM IR, so the +1 RC is on the embedded data # pointer, not the return value itself. RUNTIME_EFFECTS: dict[str, FunctionEffect] = { # --- Allocation (return +1) --- "ori_rc_alloc": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.NONE], # size, align is_allocation=True, ), # Note: ori_str_from_raw returns OriStr struct {len, cap, data} via sret. # The +1 RC is on the embedded data pointer, not the struct itself. # SSO strings (<=23 bytes) have no heap allocation (data ptr is null). "ori_str_from_raw": FunctionEffect( return_effect=RcEffect.PLUS_ONE, # data ptr inside returned struct param_effects=[RcEffect.BORROWED, RcEffect.NONE], # src_ptr, len is_allocation=True, ), # Note: ori_str_concat also returns OriStr struct via sret. # Params are *const OriStr (pointers to struct, not to data). "ori_str_concat": FunctionEffect( return_effect=RcEffect.PLUS_ONE, # data ptr inside returned struct param_effects=[RcEffect.BORROWED, RcEffect.BORROWED], # a, b is_allocation=True, ), "ori_list_alloc_data": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.NONE], # capacity, elem_size is_allocation=True, ), # Note: ori_map_alloc does NOT exist. Map allocation uses # ori_map_literal_alloc(count, key_size, val_size, out_cap) -> ptr. "ori_map_literal_alloc": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE] * 4, # count, key_size, val_size, out_cap is_allocation=True, ), # --- Deallocation (param -1) --- "ori_rc_dec": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE], # data_ptr, drop_fn ), "ori_rc_free": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE, RcEffect.NONE], # data_ptr, size, align ), # (data, len, cap, elem_size, elem_dec_fn) — 5 params "ori_buffer_rc_dec": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 4, ), "ori_buffer_drop_unique": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 4, ), # (data, cap, len, key_size, val_size, key_dec_fn, val_dec_fn) — 7 params "ori_map_buffer_rc_dec": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 6, ), "ori_map_buffer_drop_unique": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 6, ), # --- Increment (+1 on param, no return effect) --- "ori_rc_inc": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.PLUS_ONE], # data_ptr — increments RC ), "ori_list_rc_inc": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.PLUS_ONE, RcEffect.NONE], # data_ptr, cap ), # --- COW (may consume input, returns new or same) --- # COW functions follow ori_*_cow naming # They borrow the input and may return a copy (+1) or the same pointer # For balance purposes: input is borrowed, output is +1 # --- String methods that return new OriStr (possible +1) --- # Each returns OriStr struct; +1 on embedded data ptr if non-SSO. "ori_str_from_int": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE], # n: i64 is_allocation=True, ), "ori_str_from_bool": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE], # b: bool is_allocation=True, ), "ori_str_from_float": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE], # f: f64 is_allocation=True, ), "ori_str_substring": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED, RcEffect.NONE, RcEffect.NONE], is_allocation=True, ), "ori_str_trim": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED], is_allocation=True, ), "ori_str_to_uppercase": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED], is_allocation=True, ), "ori_str_to_lowercase": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED], is_allocation=True, ), "ori_str_replace": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED, RcEffect.BORROWED, RcEffect.BORROWED], is_allocation=True, ), "ori_str_repeat": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_str_push_char": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), # --- Format functions (return OriStr, possible +1) --- "ori_format_int": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_format_float": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_format_str": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_format_bool": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_format_char": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), # --- Set allocation --- "ori_set_literal_alloc": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE, RcEffect.NONE, RcEffect.NONE], # count, elem_size, out_cap is_allocation=True, ), "ori_set_empty": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[], is_allocation=True, ), # --- List allocation (additional) --- "ori_list_empty": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[], is_allocation=True, ), # --- Set RC operations --- "ori_set_buffer_rc_dec": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 5, ), "ori_set_buffer_drop_unique": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE] + [RcEffect.NONE] * 5, ), # --- Iterator sources (allocate heap-backed iterator state) --- "ori_iter_from_list": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED] * 3, # data, len, elem_size is_allocation=True, ), "ori_iter_from_range": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.NONE] * 4, # start, end, step, inclusive is_allocation=True, ), "ori_iter_from_str": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED], is_allocation=True, ), "ori_iter_from_map": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.BORROWED] * 4, is_allocation=True, ), # --- Iterator adapters (consume input iterator, return new) --- "ori_iter_map": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_iter_filter": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.BORROWED, RcEffect.NONE], is_allocation=True, ), "ori_iter_take": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE], is_allocation=True, ), "ori_iter_skip": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.NONE], is_allocation=True, ), "ori_iter_enumerate": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE], is_allocation=True, ), "ori_iter_zip": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.MINUS_ONE, RcEffect.NONE], is_allocation=True, ), "ori_iter_chain": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[RcEffect.MINUS_ONE, RcEffect.MINUS_ONE], is_allocation=True, ), # --- Iterator consumers (consume input iterator, no new allocation) --- "ori_iter_drop": FunctionEffect( return_effect=RcEffect.NONE, param_effects=[RcEffect.MINUS_ONE], ), # --- Catch/recover (returns OriStr) --- "ori_catch_recover": FunctionEffect( return_effect=RcEffect.PLUS_ONE, param_effects=[], is_allocation=True, ), } def get_effect(func_name: str) -> FunctionEffect | None: """Look up the RC effect of a runtime function. Returns None for unknown functions (user-defined or unrecognized runtime). COW functions (ori_*_cow) are handled by pattern matching. """ if func_name in RUNTIME_EFFECTS: return RUNTIME_EFFECTS[func_name] # COW pattern: ori_{type}_{op}_cow if func_name.startswith("ori_") and func_name.endswith("_cow"): return FunctionEffect( return_effect=RcEffect.PLUS_ONE, # COW returns owned param_effects=[RcEffect.BORROWED], # input is borrowed ) return None def is_allocation_function(func_name: str) -> bool: """Check if a function is known to allocate (return +1).""" effect = get_effect(func_name) return effect is not None and effect.is_allocation -
Verify all entries against
compiler/ori_rt/src/(read the actual Rust function signatures) -
Verify all entries against
compiler/ori_llvm/src/codegen/runtime_decl/runtime_functions.rs -
Add tests in
.claude/skills/code-journey/tests/test_effect_summaries.py
Also missing from _RC_DEC_RE in arc_metrics.py (these are included in the effect table above but the current _RC_DEC_RE regex does not match them):
- Add
ori_set_buffer_rc_decto_RC_DEC_RE— set buffer decrement - Add
ori_set_buffer_drop_uniqueto_RC_DEC_RE— set buffer unique drop
Verification checklist for effect table completeness:
- All iterator sources (
ori_iter_from_list,ori_iter_from_range, etc.) present with +1 - All iterator adapters (
ori_iter_map,ori_iter_filter, etc.) present with -1 input / +1 output -
ori_iter_droppresent with -1 - All string methods returning
OriStr(ori_str_substring,ori_str_trim, etc.) present with +1 - All format functions (
ori_format_int,ori_format_float, etc.) present with +1 -
ori_set_literal_alloc,ori_set_empty,ori_list_emptypresent with +1
01.2 Integration into arc_metrics.py
File(s): .claude/skills/code-journey/arc_metrics.py
Update the ARC metrics extractor to use effect summaries for balance checking.
- Fix existing
_RC_DEC_REinarc_metrics.pyto includeori_map_buffer_rc_dec(currently missing from the dec pattern, which undercounts map buffer decrements) - Import
effect_summariesandget_effectintoarc_metrics.py - In
_count_rc_ops(), also count allocation functions as implicit +1:def _count_rc_ops(func: Function) -> tuple[int, int]: """Count RC inc and dec operations, including implicit allocations.""" inc = 0 dec = 0 for block in func.blocks: for instr in block.instructions: # Explicit RC inc if _RC_INC_RE.search(instr.text): inc += 1 # Explicit RC dec if _RC_DEC_RE.search(instr.text): dec += 1 # Implicit allocation (+1) from runtime functions callee = _extract_callee(instr.text) if callee: effect = get_effect(callee) if effect and effect.return_effect == RcEffect.PLUS_ONE: inc += 1 # Counts as an implicit RC inc - Add
_extract_callee()helper to pull function name from call/invoke instructions (must handle bothcallandinvoke, quoted names@"...", and sret calling convention where the first argument is a return pointer):_CALLEE_RE = re.compile(r'(?:call|invoke)\b[^@]*@(?:"([^"]+)"|(\S+?))\s*\(') def _extract_callee(text: str) -> str | None: """Extract the callee function name from a call or invoke instruction.""" m = _CALLEE_RE.search(text) if not m: return None return m.group(1) or m.group(2) - Update tests: J9 should now show balanced RC (the
ori_str_from_raw+1 balances the 3ori_rc_deccalls minus the 2 other allocations) - Preserve backward compatibility: effect summaries ADD information, don’t change existing detection
01.3 Integration into rc_balance.rs (Rust verifier)
File(s): compiler/ori_llvm/src/verify/rc_balance.rs
The in-pipeline Rust verifier has the same blind spot. Update it to recognize allocation functions beyond just ori_rc_alloc.
-
Add the same summary table concept to
rc_balance.rs:/// Runtime functions that allocate RC-managed memory. /// /// `ori_rc_alloc` and `ori_list_alloc_data` return a `ptr` directly. /// `ori_map_literal_alloc` returns a `ptr` to the hash table buffer. /// `ori_str_from_raw` and `ori_str_concat` return `OriStr` structs via /// sret — the RC-managed data pointer is embedded inside the struct. const RC_ALLOCATION_FUNCTIONS: &[&str] = &[ "ori_rc_alloc", "ori_str_from_raw", "ori_str_concat", "ori_list_alloc_data", "ori_map_literal_alloc", ]; -
Update
process_call()match to checkRC_ALLOCATION_FUNCTIONSinstead of just"ori_rc_alloc" -
Note: Only add direct-ptr-returning functions (
ori_list_alloc_data,ori_map_literal_alloc,ori_set_literal_alloc) toRC_ALLOCATION_FUNCTIONSin rc_balance.rs. Sret-returning functions (ori_str_from_raw,ori_str_concat,ori_format_*) write their return value through an sret pointer parameter (not the call result), requiring different tracking — defer to a follow-up. -
Add test in
compiler/ori_llvm/src/verify/tests.rsfor the new allocation recognition -
Run
cargo test -p ori_llvm -- verifyto validate
01.4 Completion Checklist
-
effect_summaries.pyexists with all runtime functions mapped - All entries verified against
ori_rtsource andruntime_functions.rs -
arc_metrics.pyuses effect summaries for balance checking -
rc_balance.rsrecognizes all allocation functions - J9 (strings):
arc_violations≤ 2 (down from 9) - J10 (lists):
arc_violations≤ 5 (down from 15) - All existing tests pass:
python3 -m pytest tests/andcargo test -p ori_llvm - No regressions on J1-J4, J6-J8, J11-J12
Sync test requirement:
- Add a test that greps
compiler/ori_rt/src/for allpub extern "C" fn ori_*signatures and verifies each one either:- Has an entry in
RUNTIME_EFFECTS, OR - Matches the COW wildcard pattern (
ori_*_cow), OR - Is explicitly listed in a
NO_RC_EFFECTallowlist (e.g.,ori_str_eq,ori_str_len,ori_print— functions with no RC effects) This prevents drift when new runtime functions are added.
- Has an entry in
Exit Criteria: Running extract-metrics.py on J9’s IR produces arc_violations ≤ 2 and arc_has_unbalanced: false for string construction/destruction functions. The remaining violations (if any) are genuine codegen issues, not scanner blind spots.