Section 06: Verification

Status: Not Started Goal: Prove that all ARC optimizations from Sections 01-05 are correct, produce identical observable behavior to the unoptimized baseline, maintain zero leaks, and provide measurable performance improvement.

Depends on: All prior sections.

06.1 Test Matrix

Build a comprehensive test matrix covering every optimization through every relevant type × pattern × CFG combination.

Section 01 (Statistics):
- Verify SynergyMetrics fields are correctly populated for all test programs
- rc_ops_post_emission > 0 for programs with heap types (str, [int], closures)
- rc_ops_post_emission == 0 for int-only programs
- coalesce_reduction_percent() returns plausible values (0-100%)
- CoalesceStats counts match manual IR inspection for 3+ programs
- Statistics do not regress program output (pure instrumentation)
Section 02 (Barriers):
- Type: str, [int], Option<str>, closures, {str: int} map, Set<str>
- Call pattern: all-borrowed, all-owned, mixed, no-contract (FFI)
- CFG: linear, loop body, conditional call
Section 03 (KnownSafe):
- Type: str, [int], Option<str>, closures, structs with heap fields
- Nesting: 1-deep (no elim), 2-deep (one pair), 3-deep (two pairs)
- CFG: linear, diamond (both arms increment), loop (invariant RC)
Section 04 (COW Contraction):
- Type: struct with mutable field, [int], {str: int}
- CowMode: StaticUnique, Dynamic, StaticShared
- Pattern: single mutation, loop mutation, conditional mutation
Section 05 (RC Motion):
- Type: str, [int], Option<str>, closures, maps
- CFG: diamond (retain in pred, release in succs), triangle, loop, nested loop, early return
- Pattern: same-block pair, cross-block pair, loop-invariant RC

06.1.1 Discovered Gaps

Gap	Roadmap Location	Test	Severity
(to be filled during implementation)

/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (06.1) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.1 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.1: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

06.2 Behavioral Equivalence

Verify that optimized programs produce identical output to unoptimized.

Implement ORI_SKIP_ARC_OPTS flag in the AIMS pipeline:
- Add environment variable check in run_aims_pipeline() to skip Sections 02-05 optimizations (selective barriers, KnownSafe elimination, COW contraction, RC motion) while keeping baseline RC emission
- This flag enables A/B comparison testing between optimized and unoptimized builds
- Implementation: check std::env::var("ORI_SKIP_ARC_OPTS").is_ok() at pipeline entry; when set, skip the new passes but run the existing baseline pipeline. Use std::sync::OnceLock to avoid repeated env var lookups per function:
```
static SKIP_ARC_OPTS: std::sync::OnceLock<bool> = std::sync::OnceLock::new();
let skip = *SKIP_ARC_OPTS.get_or_init(|| std::env::var("ORI_SKIP_ARC_OPTS").is_ok());
```
- The AimsPipelineConfig is a better home for this flag — add skip_new_opts: bool and set it from env var once in run_aims_pipeline_all(), avoiding per-function env var reads
Build comparison harness:
- Compile each test program twice: with and without new optimizations
- Run both and compare stdout, stderr, exit code
- Report any mismatches
Apply to all Ori spec tests:
```
for test in tests/spec/**/*.ori; do
    ORI_SKIP_ARC_OPTS=1 ori run "$test" > /tmp/baseline
    ori run "$test" > /tmp/optimized
    diff /tmp/baseline /tmp/optimized
done
```
Additionally, use diagnostics/dual-exec-verify.sh which compares interpreter vs LLVM output — behavioral equivalence with the interpreter (which has no ARC optimization) is a stronger guarantee than comparing two LLVM builds. Both methods should be used.
Track and investigate every mismatch.
/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (06.2) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.2 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.2: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

06.3 Code Journey

Run /code-journey to test the full pipeline end-to-end with progressively complex programs.

Run /code-journey — journeys escalate until the compiler breaks down
All CRITICAL findings from journey results triaged (fixed or tracked)
Eval and AOT paths produce identical results for all passing journeys
Journey results archived in plans/code-journeys/
/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (06.3) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.3 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.3: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

06.4 Safety Verification

06.5 Performance Validation

RC operation reduction:
- Baseline: ORI_LOG=ori_arc=info ori build tests/spec/ with Sections 02-05 disabled
- Optimized: same with all sections enabled
- Target: 30%+ total RC operation reduction on typical programs
- Note: aims-compare.sh was removed (used non-existent --features aims). RC operation measurement requires a new tool — use ORI_AUDIT_CODEGEN=1 + diagnostics/rc-stats.sh as an interim approach for counting RC ops.
Compile-time overhead:
- Measure compilation time with and without new passes
- Target: < 5% compile-time regression
- Run cargo bench -p oric if parser/typeck benchmarks exist
Runtime performance:
- Benchmark programs in tests/benchmarks/
- Measure execution time with and without optimizations
- Target: measurable improvement on RC-heavy programs
/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (06.5) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.5 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.5: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

06.6 Documentation and Cleanup

06.7 Completion Checklist

Exit Criteria: All test programs produce identical output with and without optimizations. Zero leaks. Zero valgrind errors. Zero codegen audit findings. ./test-all.sh passes with 0 regressions across all ~N tests. RC operation reduction measured and documented. Compile-time overhead < 5%.