Section 06: Verification
Status: Not Started Goal: Prove that all ARC optimizations from Sections 01-05 are correct, produce identical observable behavior to the unoptimized baseline, maintain zero leaks, and provide measurable performance improvement.
Depends on: All prior sections.
06.1 Test Matrix
Build a comprehensive test matrix covering every optimization through every relevant type × pattern × CFG combination.
-
Section 01 (Statistics):
- Verify
SynergyMetricsfields are correctly populated for all test programs rc_ops_post_emission > 0for programs with heap types (str, [int], closures)rc_ops_post_emission == 0for int-only programscoalesce_reduction_percent()returns plausible values (0-100%)CoalesceStatscounts match manual IR inspection for 3+ programs- Statistics do not regress program output (pure instrumentation)
- Verify
-
Section 02 (Barriers):
- Type: str, [int], Option<str>, closures, {str: int} map, Set<str>
- Call pattern: all-borrowed, all-owned, mixed, no-contract (FFI)
- CFG: linear, loop body, conditional call
-
Section 03 (KnownSafe):
- Type: str, [int], Option<str>, closures, structs with heap fields
- Nesting: 1-deep (no elim), 2-deep (one pair), 3-deep (two pairs)
- CFG: linear, diamond (both arms increment), loop (invariant RC)
-
Section 04 (COW Contraction):
- Type: struct with mutable field, [int], {str: int}
- CowMode: StaticUnique, Dynamic, StaticShared
- Pattern: single mutation, loop mutation, conditional mutation
-
Section 05 (RC Motion):
- Type: str, [int], Option<str>, closures, maps
- CFG: diamond (retain in pred, release in succs), triangle, loop, nested loop, early return
- Pattern: same-block pair, cross-block pair, loop-invariant RC
06.1.1 Discovered Gaps
| Gap | Roadmap Location | Test | Severity |
|---|---|---|---|
| (to be filled during implementation) |
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (06.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.2 Behavioral Equivalence
Verify that optimized programs produce identical output to unoptimized.
-
Implement
ORI_SKIP_ARC_OPTSflag in the AIMS pipeline:- Add environment variable check in
run_aims_pipeline()to skip Sections 02-05 optimizations (selective barriers, KnownSafe elimination, COW contraction, RC motion) while keeping baseline RC emission - This flag enables A/B comparison testing between optimized and unoptimized builds
- Implementation: check
std::env::var("ORI_SKIP_ARC_OPTS").is_ok()at pipeline entry; when set, skip the new passes but run the existing baseline pipeline. Usestd::sync::OnceLockto avoid repeated env var lookups per function:static SKIP_ARC_OPTS: std::sync::OnceLock<bool> = std::sync::OnceLock::new(); let skip = *SKIP_ARC_OPTS.get_or_init(|| std::env::var("ORI_SKIP_ARC_OPTS").is_ok()); - The
AimsPipelineConfigis a better home for this flag — addskip_new_opts: booland set it from env var once inrun_aims_pipeline_all(), avoiding per-function env var reads
- Add environment variable check in
-
Build comparison harness:
- Compile each test program twice: with and without new optimizations
- Run both and compare stdout, stderr, exit code
- Report any mismatches
-
Apply to all Ori spec tests:
for test in tests/spec/**/*.ori; do ORI_SKIP_ARC_OPTS=1 ori run "$test" > /tmp/baseline ori run "$test" > /tmp/optimized diff /tmp/baseline /tmp/optimized doneAdditionally, use
diagnostics/dual-exec-verify.shwhich compares interpreter vs LLVM output — behavioral equivalence with the interpreter (which has no ARC optimization) is a stronger guarantee than comparing two LLVM builds. Both methods should be used. -
Track and investigate every mismatch.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.3 Code Journey
Run /code-journey to test the full pipeline end-to-end with progressively complex programs.
-
Run
/code-journey— journeys escalate until the compiler breaks down -
All CRITICAL findings from journey results triaged (fixed or tracked)
-
Eval and AOT paths produce identical results for all passing journeys
-
Journey results archived in
plans/code-journeys/ -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.4 Safety Verification
-
RC balance:
diagnostics/rc-stats.shon all test programs → balanced -
Leak check:
ORI_CHECK_LEAKS=1on all test programs → zero leaks -
Valgrind:
diagnostics/valgrind-aot.shon representative programs → no memory errors -
Codegen audit:
ORI_AUDIT_CODEGEN=1 ORI_AUDIT_STRICT=1on all test programs → no findings -
Stress test: 1000+ allocations, 100+ recursion depth, 1000+ list elements — all clean
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.4) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.4 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.4: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.5 Performance Validation
-
RC operation reduction:
- Baseline:
ORI_LOG=ori_arc=info ori build tests/spec/with Sections 02-05 disabled - Optimized: same with all sections enabled
- Target: 30%+ total RC operation reduction on typical programs
- Note:
aims-compare.shwas removed (used non-existent--features aims). RC operation measurement requires a new tool — useORI_AUDIT_CODEGEN=1+diagnostics/rc-stats.shas an interim approach for counting RC ops.
- Baseline:
-
Compile-time overhead:
- Measure compilation time with and without new passes
- Target: < 5% compile-time regression
- Run
cargo bench -p oricif parser/typeck benchmarks exist
-
Runtime performance:
- Benchmark programs in
tests/benchmarks/ - Measure execution time with and without optimizations
- Target: measurable improvement on RC-heavy programs
- Benchmark programs in
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.5) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.5 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.5: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.6 Documentation and Cleanup
-
Update
CLAUDE.mdwith new pipeline steps if applicable -
Update
.claude/rules/arc.mdwith new pass descriptions -
Update
compiler/ori_arc/src/aims/mod.rsmodule docs -
Add architecture notes to
compiler/ori_arc/src/aims/knownsafe/mod.rs -
Add architecture notes to
compiler/ori_arc/src/aims/rc_motion/mod.rs -
Plan annotation cleanup: strip ALL code annotations referencing this plan (
clang-arc-lessons, section numbers01-06, any plan-specific markers) from all source files. Runbash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan clang-arc-lessonsand verify 0 annotations remain. Only spec references (Spec: Clause N.M) are permanent. -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.6) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.6 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.6: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.7 Completion Checklist
- Test matrix covers all sections (01-05) × types × patterns × CFG combinations
-
ORI_SKIP_ARC_OPTSflag implemented and functional - Behavioral equivalence verified — 0 mismatches across all spec tests (both A/B and dual-exec)
- Code journey passes — eval/AOT match, no CRITICAL findings
- RC balance verified via
rc-stats.sh— all functions balanced -
ORI_CHECK_LEAKS=1— zero leaks on all test programs - Valgrind clean on representative programs
- Codegen audit clean (
ORI_AUDIT_STRICT=1) - Stress tests pass
- 30%+ total RC operation reduction on typical programs
- < 5% compile-time regression
- All documentation updated
- Plan annotation cleanup:
plan-annotations.shreturns 0 annotations for this plan -
./test-all.shgreen -
./clippy-all.shgreen -
/tpr-reviewpassed — independent Codex review clean -
/impl-hygiene-reviewpassed — hygiene review clean -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: All test programs produce identical output with and without optimizations. Zero leaks. Zero valgrind errors. Zero codegen audit findings. ./test-all.sh passes with 0 regressions across all ~N tests. RC operation reduction measured and documented. Compile-time overhead < 5%.