Section 04: Block-level RC Stats

Status: Complete Goal: Give developers the ability to localize RC leaks/over-releases to specific basic blocks within a function, not just the function as a whole. Currently rc-stats.sh reports per-function totals — “this function has +2 balance” — but cannot tell you WHICH loop or branch is responsible. The fix creates a new raw-counting pass (rc_histogram.rs) that is architecturally separate from the existing semantic lifecycle verifier (rc_balance.rs), emits typed JSON via serde, and updates the shell script to render it.

Critical architectural constraint (from dual-source review): rc_balance.rs is a semantic lifecycle state machine tracking pointer ownership transitions (Live/CowConsumed/Decremented). It deliberately tracks only ori_rc_alloc, ori_rc_dec, and COW calls — this is correct for its purpose. The new per-block counting is a syntactic histogram — it counts ALL 5 RC op types (ori_rc_alloc, ori_rc_inc, ori_rc_dec, ori_rc_free, COW) without tracking pointer identity or state transitions. These MUST be separate modules. Mixing histogram counting into the lifecycle tracker would corrupt the state machine’s invariants. The awk parser in rc-stats.sh already tracks all 5 ops — the new pass must match.

Success Criteria:

Context: Both Codex and Gemini independently identified critical issues with the original plan:

rc_balance.rs does NOT track ori_rc_inc or ori_rc_free — merging counting into it would silently corrupt the balance equation (alloc + inc) - (dec + free).
The codegen audit: prefix is consumed by codegen-audit.sh via grep — adding JSON lines with that prefix would break it.
Per-block exit code 1 would produce false positives because RC ownership commonly crosses basic-block boundaries.
The --optimized flag path originally had no compiler-side equivalent — resolved by adding post-optimization histogram support (04.3).
Removing the awk parser without a migration safety net risks silent numeric divergence.

Reference implementations:

Swift ARC optimizer: tracks retain/release per SIL basic block
Lean 4 IR/RC.lean: per-block inc/dec analysis in the RC insertion pass

Depends on: None.

04.1 Create RcOpKind enum and rc_histogram.rs counting pass

File(s):

compiler/ori_llvm/src/verify/rc_histogram.rs (NEW — the raw counting pass)
compiler/ori_llvm/src/verify/mod.rs (add mod rc_histogram; and wire into audit_module_with_options)

Why a new file, not modifying rc_balance.rs: rc_balance.rs is a semantic lifecycle verifier — it tracks pointer identity and state transitions (Live → CowConsumed → Decremented). The histogram is a syntactic counter — it counts instruction occurrences without tracking pointer identity. These are fundamentally different concerns. Mixing them would create a LEAK:phase-bleeding (histogram counting polluting the state machine) and risk corrupting rc_balance.rs’s invariants. The RcOpKind enum provides a shared vocabulary without coupling the implementations.

Create compiler/ori_llvm/src/verify/rc_histogram.rs containing:
- RcOpKind enum: Alloc, Inc, Dec, Free, Cow — with Debug, Clone, Copy, PartialEq, Eq, Hash derives
- fn classify_rc_call(callee_name: &str) -> Option<RcOpKind> — maps runtime RC functions to operation kinds. Must cover ALL RC operations emitted by the codegen, not just the 5 base names:
  - Alloc: ori_rc_alloc, ori_list_alloc_data, ori_map_literal_alloc, ori_set_literal_alloc (collection literal allocation wrappers that call ori_rc_alloc internally)
  - Inc: ori_rc_inc, ori_str_rc_inc, ori_list_rc_inc (slice-aware typed RC inc)
  - Dec: ori_rc_dec, ori_str_rc_dec, ori_buffer_rc_dec, ori_map_buffer_rc_dec, ori_set_buffer_rc_dec (slice-aware typed RC dec)
  - Free: ori_rc_free, ori_list_free_data, ori_buffer_drop_unique, ori_set_buffer_drop_unique, ori_map_buffer_drop_unique (free wrappers and unique-drop functions that skip atomic dec and directly free — these are RC cleanup operations that must be counted as releases)
  - Cow: COW functions (via super::is_cow_function) — ori_list_*_cow, ori_str_*_cow, etc.
  - Non-counting (returns None): ori_rc_is_unique, ori_rc_is_unique_or_null, ori_rc_live_count, ori_rc_reset_live_count, ori_rc_realloc, ori_list_reset_buffer (internal reallocation — dec+alloc happen inside, not externally observable as separate RC events)
  - Source of truth for the function list: compiler/ori_llvm/src/codegen/runtime_decl/runtime_functions.rs. The classifier must cover all RC counting operations (functions that perform alloc/inc/dec/free on refcounts or COW mutations). NOT all functions containing rc — non-counting helpers must return None. Add an exhaustiveness test with an explicit allowlist of classified patterns (*_rc_inc, *_rc_dec, *_drop_unique, exact ori_rc_alloc, exact ori_rc_free, *_cow) and explicit None assertions for ori_rc_is_unique, ori_rc_live_count, ori_rc_realloc, ori_list_reset_buffer, and unrelated names like ori_str_to_uppercase. Include explicit test cases for ori_buffer_drop_unique → Free, ori_set_buffer_drop_unique → Free, ori_map_buffer_drop_unique → Free.
  - Note: The current awk parser in rc-stats.sh only matches the 5 base patterns (ori_rc_alloc/inc/dec/free + COW). The new classifier intentionally EXPANDS coverage to typed RC operations that the awk parser missed (e.g., ori_buffer_rc_dec, ori_str_rc_inc). The migration test matrix (Phase B) must account for this — the old awk totals may be LOWER than JSON totals for files with typed RC calls.
- pub(super) struct BlockHistogram { pub label: String, pub counts: [u32; 5] } (indexed by RcOpKind discriminant) — pub(super) so sibling module rc_stats.rs can read it
- pub(super) struct FunctionHistogram { pub name: String, pub blocks: Vec<BlockHistogram> } — same visibility rationale
- pub(super) fn collect_module_histogram(module: &Module<'_>, options: &AuditOptions) -> Vec<FunctionHistogram> — walks all functions/blocks, calls classify_rc_call on every call/invoke instruction, accumulates counts per block. Note: pub(super) matches FunctionHistogram visibility — pub fn would trigger E0446 (private type in public interface)
- Uses super::callee_name() (shared with rc_balance) to extract callee names
- Uses super::rc_balance::should_audit_fn() for function filtering (already pub(super))
- Function name demangling: The histogram MUST demangle Ori function names before storing them in FunctionHistogram.name. Reuse the canonical AOT demangler (aot/mangle/parse.rs or its public API) — this is the SSOT for name demangling in ori_llvm. The demangled format will differ from the awk parser’s legacy format (e.g., math.@add instead of @math.add) — this is intentional and correct: the compiler’s canonical demangling is the authoritative format, and introducing a second bespoke demangler would be LEAK:scattered-knowledge. The Phase B migration test must account for this format difference (compare function-level RC totals, not exact name strings; or normalize names in the comparison).
- For block labels: use BasicBlock::get_name() if non-empty; otherwise generate format!("bb_{}", block_index) as a fallback. Many LLVM basic blocks are unnamed (empty get_name()) — the fallback ensures every block has a printable identifier in the JSON output and the rc-stats.sh table
Add mod rc_histogram; to compiler/ori_llvm/src/verify/mod.rs
Add #[cfg(test)] mod tests; at the bottom of rc_histogram.rs (per CLAUDE.md — test bodies in sibling tests.rs, not inline)
File size check: rc_histogram.rs should be under 200 lines. The counting logic is simple — no state machine, just instruction classification and accumulation.
Add Rust unit tests in compiler/ori_llvm/src/verify/rc_histogram/tests.rs:
- test_classify_rc_call_alloc_returns_alloc — "ori_rc_alloc" → Some(RcOpKind::Alloc)
- test_classify_rc_call_inc_returns_inc — "ori_rc_inc" → Some(RcOpKind::Inc)
- test_classify_rc_call_dec_returns_dec — "ori_rc_dec" → Some(RcOpKind::Dec)
- test_classify_rc_call_free_returns_free — "ori_rc_free" → Some(RcOpKind::Free)
- test_classify_rc_call_cow_function_returns_cow — "ori_list_push_cow" → Some(RcOpKind::Cow)
- test_classify_rc_call_unrelated_returns_none — "puts" → None
- test_empty_module_produces_empty_histogram — synthetic inkwell module with no functions → empty vec
- test_module_with_rc_alloc_and_dec_counts_per_block — synthetic module with ori_rc_alloc in entry block and ori_rc_dec in exit block → correct per-block counts
- Test naming follows <subject>_<scenario>_<expected> (CLAUDE.md §Test function naming). No ephemeral identifiers.
Run timeout 150 cargo t -p ori_llvm -- rc_histogram to verify tests pass (19 tests pass)

CRITICAL: rc_balance.rs is NOT modified in this section. The lifecycle verifier continues to track only alloc/dec/cow as before. If future work wants the lifecycle verifier to also track inc/free, that is a separate change with its own state-machine analysis.

Subsection close-out (04.1) — MANDATORY before starting 04.2:
- All tasks above are [x] and verified
- Update this subsection’s status in section frontmatter to complete
- Retrospective 04.1: No tooling gaps. -D dead_code forced early wiring into audit_module_with_options(). Demangle re-export path (crate::aot::demangle) preferred over private module path.

04.2 Typed JSON schema structs with serde

File(s):

compiler/ori_llvm/src/verify/rc_stats.rs (NEW — typed JSON schema)
compiler/ori_llvm/Cargo.toml (add serde_json dependency)

Why typed structs, not format!(): ori_llvm/Cargo.toml already depends on serde = { version = "1", features = ["derive"] } (line 26). Adding serde_json for serialization ensures correct JSON escaping, stable field ordering, and compile-time schema enforcement. Manual format!() JSON emission is a LEAK/EXPOSURE risk — escape handling for function names containing quotes/backslashes, format drift between emitter and consumer, no compile-time schema contract.

Why a schema_version field: The JSON schema will be consumed by rc-stats.sh and potentially other tools. A version field enables backward-compatible schema evolution without breaking consumers. Version 1 is the initial schema defined here.

Add serde_json = "1" to [dependencies] in compiler/ori_llvm/Cargo.toml (serde is already present)

Create compiler/ori_llvm/src/verify/rc_stats.rs containing typed structs:

use serde::Serialize;

/// Schema version for backward compatibility. Bump when adding fields.
pub const SCHEMA_VERSION: u32 = 1;

/// Top-level RC stats report emitted as JSON.
#[derive(Debug, Clone, Serialize)]
pub struct RcStatsReport {
    pub schema_version: u32,
    /// Whether this report covers optimized or unoptimized IR.
    pub optimized: bool,
    pub functions: Vec<FunctionStats>,
}

/// Per-function RC operation stats.
#[derive(Debug, Clone, Serialize)]
pub struct FunctionStats {
    pub name: String,
    pub blocks: Vec<BlockStats>,
    /// Function-level totals (sum of all blocks).
    pub totals: OpCounts,
}

/// Per-basic-block RC operation counts.
#[derive(Debug, Clone, Serialize)]
pub struct BlockStats {
    pub label: String,
    pub counts: OpCounts,
}

/// Raw RC operation counts.
#[derive(Debug, Clone, Default, Serialize)]
pub struct OpCounts {
    pub alloc: u32,
    pub inc: u32,
    pub dec: u32,
    pub free: u32,
    pub cow: u32,
}

Add an associated constructor on RcStatsReport: impl RcStatsReport { pub(super) fn from_histograms(histograms: &[FunctionHistogram], optimized: bool) -> Self { ... } } — maps FunctionHistogram → FunctionStats with computed totals and sets optimized field. Note: pub(super) matches FunctionHistogram visibility
Add mod rc_stats; to compiler/ori_llvm/src/verify/mod.rs
Add #[cfg(test)] mod tests; at the bottom of rc_stats.rs (per CLAUDE.md — test bodies in sibling tests.rs)
Add impl RcStatsReport { pub fn emit_to_stderr(&self) } method that centralizes JSON serialization and emission: eprintln!("codegen stats: json: {}", serde_json::to_string(self).expect("RcStatsReport serialization")). All hook points (verify/mod.rs, aot/object.rs, build/single.rs) call this method instead of manually constructing the JSON emission — prevents algorithmic duplication of the formatting contract.
File size check: rc_stats.rs is 109 lines — under 120-line limit.
Add Rust unit tests in compiler/ori_llvm/src/verify/rc_stats/tests.rs:
- test_empty_histogram_produces_version_one_empty_functions — empty input → RcStatsReport { schema_version: 1, functions: [] }
- test_histogram_to_report_computes_function_totals — two blocks with known counts → totals are sums
- test_report_serializes_to_valid_json — serde_json::to_string(&report) succeeds and contains "schema_version":1
- test_function_name_with_special_chars_serializes_correctly — function name containing " and \ serializes without corruption (proves serde handles escaping)
Run timeout 150 cargo t -p ori_llvm -- rc_stats to verify tests pass (5 tests pass)
Subsection close-out (04.2) — MANDATORY before starting 04.3:
- All tasks above are [x] and verified
- Update this subsection’s status in section frontmatter to complete
- Retrospective 04.2: No tooling gaps. Wired from_histograms + emit_to_stderr directly into audit_module_with_options (04.3 early wiring, forced by -D dead_code).

04.3 Wire histogram into audit pipeline and emit JSON

File(s):

compiler/ori_llvm/src/verify/mod.rs (call histogram, emit JSON)
compiler/ori_llvm/src/verify/report.rs (add emit_rc_stats_json method to AuditReport or as a standalone function)

Prefix choice: codegen stats: json: NOT codegen audit:: The existing codegen-audit.sh script (line 163) extracts lines via grep "^codegen audit:". Adding a JSON line with that same prefix would cause codegen-audit.sh to parse it as a garbled audit finding. Using the distinct prefix codegen stats: json: cleanly separates the two output streams. rc-stats.sh will grep for ^codegen stats: json: to extract its data.

In audit_module_with_options() (verify/mod.rs), after the existing checks, call rc_histogram::collect_module_histogram(module, options) and convert to RcStatsReport via rc_stats::RcStatsReport::from_histograms()
Store the RcStatsReport in AuditReport (add a field: pub rc_stats: Option<rc_stats::RcStatsReport>) or return it alongside the report
Prerequisite: Change AuditOptions::from_env() visibility from fn from_env() to pub fn from_env() in compiler/ori_llvm/src/verify/mod.rs (line 55) — the optimized hook points in aot/object.rs and build/single.rs need to construct AuditOptions from the environment.
Optimized-IR histogram support (SSOT — eliminates awk parser entirely): Add a pub fn audit_module_histogram_only(module: &Module<'_>, options: &AuditOptions) -> RcStatsReport entry point that runs ONLY the histogram pass (no lifecycle/COW/ABI/safety checks), calling from_histograms(..., optimized: true). This enables callers to collect stats on the post-optimization module without running the full audit pass.
- Hook points (MUST be gated behind if verify::audit_requested()): The histogram emission for optimized IR must be wrapped in an if verify::audit_requested() { ... } block — without this gate, every normal ori build --release would run the histogram and spam stderr. The specific hook points are:
  - compiler/ori_llvm/src/aot/object.rs in verify_optimize_emit() — after run_optimization_passes() completes but before object emission. This covers normal AOT object builds.
  - compiler/oric/src/commands/build/single.rs — after optimize_module() for --emit=llvm-ir builds.
- Both paths emit a JSON line: codegen stats: json: {..."optimized":true...}
- Smoke test: ORI_AUDIT_CODEGEN=1 cargo run -p oric --bin ori -- build --release diagnostics/fixtures/clean.ori -o /tmp/test_bin 2>&1 | grep "codegen stats: json:" must produce TWO JSON lines: one with "optimized":false and one with "optimized":true.
- Non-audit builds must NOT run the histogram: ori build --release diagnostics/fixtures/clean.ori (without ORI_AUDIT_CODEGEN=1) must produce zero codegen stats: lines on stderr.
In the emit_to_stderr() method of AuditReport, after the existing text output, call the centralized emitter:
```
if let Some(ref stats) = self.rc_stats {
    stats.emit_to_stderr(); // Defined in rc_stats.rs — single JSON emission point
}
```
Do NOT inline serde_json::to_string + eprintln! here — the RcStatsReport::emit_to_stderr() method (from 04.2) owns the formatting contract. All other hook points (aot/object.rs, build/single.rs) also call report.emit_to_stderr() directly.
Verify codegen-audit.sh is unaffected: Verified — codegen audit: lines emitted separately from codegen stats: json: lines. Comment added in report.rs emitter.
Add Rust unit tests (in verify/tests.rs or a dedicated test):
- test_audit_module_populates_rc_stats_with_counts — synthetic module with RC calls → report.rc_stats is Some with correct counts
- test_audit_empty_module_rc_stats_has_no_functions — empty module → rc_stats.functions is empty, schema_version is 1
- test_rc_stats_json_prefix_does_not_match_codegen_audit_grep — the output string starts with "codegen stats: json:" (proves it does not start with "codegen audit:")
File size check: verify/mod.rs is 152 lines, report.rs is 144 lines — both under 200.
Run timeout 150 cargo t -p ori_llvm to verify no regressions (all 27 new tests pass)
Smoke test: Two JSON lines emitted (optimized:false + optimized:true). Non-audit builds produce zero lines.
Subsection close-out (04.3) — MANDATORY before starting 04.4:
- All tasks above are [x] and verified
- Update this subsection’s status in section frontmatter to complete
- Retrospective 04.3: No tooling gaps. pub use re-export pattern cleanly solves cross-module type visibility. Smoke test confirms gating works correctly — zero overhead in non-audit builds.

04.4 Update rc-stats.sh with —block-level and JSON migration

File(s): diagnostics/rc-stats.sh, diagnostics/self-test.sh

This subsection has three phases: (A) add --block-level using compiler JSON, (B) migration test matrix comparing awk vs JSON totals, (C) migrate ALL modes to compiler JSON and remove awk parser entirely.

Phase A: Add —block-level flag

Add --block-level flag to rc-stats.sh argument parser
When --block-level is passed (without --optimized):
1. Compile with ORI_AUDIT_CODEGEN=1, capture stderr (do NOT fail on nonzero compiler exit — audit findings cause nonzero exit but stats JSON is still emitted to stderr before the exit code is set)
2. Extract the correct codegen stats: json: line from stderr by filtering on the "optimized" field: for --block-level (no --optimized), select the line with "optimized":false; for --block-level --optimized, select the line with "optimized":true. When ORI_AUDIT_CODEGEN=1 and release builds are involved, stderr may contain TWO JSON lines (one per IR stage). If no matching stats JSON line is found AND the compiler exit was nonzero, THEN exit 2 (“compilation failed before stats pass”). If the matching JSON line IS found, proceed regardless of compiler exit code.
3. Parse JSON with python3 -c 'import sys,json; ...' (available on all target platforms; jq is optional)
4. Render a hierarchical table: Function > Block > alloc/inc/dec/free/cow/balance
5. Per-block balance shown for localization but does NOT affect exit code (RC ownership crosses block boundaries — a per-block imbalance is normal control flow, not a bug)
6. Function-level balance (sum of all blocks) determines exit code: 0 = all functions balanced, 1 = any function imbalanced (matches current behavior exactly)
7. Audit-error resilience: The stats pass runs before has_errors() triggers nonzero exit, so stats JSON is available even when audit findings exist. This makes rc-stats.sh useful on exactly the files developers need it for — files with RC issues.
--block-level --optimized is supported: uses the optimized JSON ("optimized": true) to render per-block stats for the post-optimization IR. Both flags compose naturally since both modes consume compiler JSON.
Add self-test entries:
- rc-stats.sh --block-level fixtures/clean.ori produces per-block output containing block labels
- rc-stats.sh --block-level --optimized fixtures/clean.ori produces per-block output from optimized IR
- rc-stats.sh --optimized fixtures/clean.ori produces function-level output from optimized IR

Phase B: Migration test matrix (awk vs JSON numeric equivalence)

Why: The awk parser counts RC ops by regex-matching LLVM IR text. The new histogram pass counts by walking inkwell’s in-memory IR. These MUST agree before we remove the awk parser. Subtle differences (e.g., the awk parser counting invoke calls that the histogram misses, or the histogram counting inlined COW calls the awk regex doesn’t match) would silently corrupt rc-stats.sh output.

Create a temporary --compare-awk flag (or internal validation mode) that runs BOTH the awk parser (on IR text) and the JSON parser (from compiler output) on the same file, then compares per-function totals
Test the comparison across multiple fixture files:
- diagnostics/fixtures/simple.ori — minimal program, few RC ops
- diagnostics/fixtures/clean.ori — program with RC operations
- diagnostics/fixtures/closure-capture.ori (if exists from Section 06, or create a minimal one) — closures exercise inc/dec paths
- At least one program with ori_rc_free calls (COW or drop path)
For the 5 base RC operations (ori_rc_alloc/inc/dec/free + COW), awk and JSON totals should match. For typed RC operations (ori_str_rc_inc, ori_buffer_rc_dec, etc.), JSON totals will be HIGHER because the new classifier covers operations the awk parser never counted — this is correct and expected, not a regression.
Document any divergence: clean.ori shows awk=0/json=1 for alloc and dec — accounted for by typed RC ops (ori_list_alloc_data, ori_str_rc_dec). simple.ori shows exact match (0/0).
No UNEXPECTED divergence found — all differences are typed RC ops the awk parser never counted.

Section 04: Block-level RC Stats

04.1 Create RcOpKind enum and rc_histogram.rs counting pass

04.2 Typed JSON schema structs with serde

04.3 Wire histogram into audit pipeline and emit JSON

04.4 Update rc-stats.sh with —block-level and JSON migration

Phase A: Add —block-level flag

Phase B: Migration test matrix (awk vs JSON numeric equivalence)

Phase C: Migrate ALL modes to JSON, remove awk parser entirely

04.R Third Party Review Findings

04.N Completion Checklist