Section 01: Remove aims-compare + Create debug-release-compare
Status: Complete Goal: Replace the dead AIMS comparison scripts with a new debug-vs-release comparison tool that catches FastISel-only bugs and optimization-dependent behavioral divergences.
Success Criteria:
-
aims-compare.sh,aims-baseline.sh,aims-measure.shdeleted - New
debug-release-compare.shcompiles + runs through bothtarget/debug/oriandtarget/release/ori, comparing exit codes and stdout - On mismatch, auto-dumps LLVM IR from both builds for diffing
-
self-test.shpasses with new debug-release-compare test entries (28/28 passed) - Satisfies mission criterion: “aims-compare.sh removed; new debug-release-compare.sh functional”
Context: aims-compare.sh uses --features aims (line 177) which no longer exists — the aims feature was removed when AIMS became the default pipeline. The script fails immediately on any invocation. Codex verified: cargo build -p oric --features aims fails with “package does not contain this feature: aims”. Keeping the aims-compare name after AIMS is default is DRIFT per impl-hygiene.md. The debug-vs-release capability is genuinely useful since LLVM FastISel (debug) behaves differently from the full optimization pipeline (release).
Reference implementations:
- Swift verifier: runs same input through debug and release SIL pipelines to catch optimization-dependent bugs
Depends on: None.
01.1 Remove dead AIMS comparison scripts and stale references
File(s): diagnostics/aims-compare.sh, diagnostics/aims-baseline.sh, diagnostics/aims-measure.sh, CLAUDE.md, .claude/rules/arc.md, queued plan files
These three scripts (~900 lines total) are dead code. aims-compare.sh (347 lines) fails at line 177 (--features aims removed). aims-baseline.sh (244 lines) and aims-measure.sh (292 lines) are orphaned support scripts only called by aims-compare.sh.
IMPORTANT — Semantic mismatch: The old aims-compare.sh compared output + RC counts across AIMS pipeline variants (behavioral + RC parity). The new debug-release-compare.sh compares debug vs release builds (exit codes + stdout + LLVM IR on mismatch). These are fundamentally different tools answering different questions. References to aims-compare.sh must NOT be blindly renamed — each consumer must be audited for whether debug-release-compare.sh is the correct replacement or whether the reference should simply be removed.
-
Delete
diagnostics/aims-compare.sh(347 lines) -
Delete
diagnostics/aims-baseline.sh(244 lines) -
Delete
diagnostics/aims-measure.sh(292 lines) -
Verify
diagnostics/self-test.shcontains no aims-compare references (confirmed: none exist) -
Verify
diagnostics/README.mdcontains no aims-compare references (confirmed: none exist) -
Remove
CLAUDE.mdline 152 aims-compare reference entirely — removed, leaving AIMS lattice description without dead tool reference -
Remove
.claude/rules/arc.mdline 174 aims-compare reference — replaced with debug-release-compare.sh reference (accurate semantics: debug vs release comparison, NOT RC comparison) -
Audit and fix stale cross-plan references (both plans are
status: queued, not active):plans/locality-representation-unification/section-05-verification.mdlines 19, 91, 233: updated with notes about removal and semantic differenceplans/clang-arc-lessons/section-06-verification.mdline 160: updated with note about removal and interim RC measurement approach
-
Verify
diagnostics/self-test.shstill passes after removal (24/24 passed) -
Subsection close-out (01.1) — MANDATORY before starting 01.2:
- All tasks above are
[x]and verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 01.1: no tooling gaps. Deletion + reference cleanup only; self-test.sh verification sufficient.
- All tasks above are
01.2 Create debug-release-compare.sh
File(s): diagnostics/_common.sh (extend), diagnostics/debug-release-compare.sh (new), diagnostics/self-test.sh, diagnostics/README.md, CLAUDE.md, .claude/rules/arc.md
Create a new script that compiles and runs a program through both debug and release builds, comparing behavioral output. This catches FastISel-only bugs (the >16B aggregate load issue) and optimization-dependent codegen divergences.
-
Extend
diagnostics/_common.shwith profile-specific binary resolution:- Added
find_ori_bin_profile(profile)function with existence + LLVM check + clear error messages - Added
require_both_builds()helper that setsORI_DEBUGandORI_RELEASE - Existing
find_ori_bin()unchanged
- Added
-
Create
diagnostics/debug-release-compare.shwith:--help,--no-color,--color,--verbose(standard options)- Uses
require_both_builds()from_common.shat startup - Compiles and runs through both builds, compares exit codes and stdout
- On mismatch: auto-dumps LLVM IR from both builds via
ORI_BINoverride, shows diff --verbosealso shows RC stats from both builds- Exit codes: 0 = match, 1 = mismatch, 2 = usage/infrastructure error
-
Add self-test entries to
diagnostics/self-test.sh:- Skip-with-message if release binary unavailable (safer than rename-based testing)
simple.oriandclean.orimatching output (happy path, 2 tests)--helpshows usage- No-args error handling (exit 2)
-
Add documentation to
diagnostics/README.mdwith usage examples and workflow -
Add new reference in
CLAUDE.mddiagnostic scripts section:debug-release-compare.sh -
Add new reference in
.claude/rules/arc.mdline 174 (done in 01.1 as part of the replacement) -
Run
diagnostics/self-test.sh --verbose— 28 passed, 0 failed -
Subsection close-out (01.2) — MANDATORY before starting 01.R:
- All tasks above are
[x]and verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 01.2: no tooling gaps. Script creation followed existing patterns cleanly; self-test infrastructure made test addition straightforward.
- All tasks above are
01.R Third Party Review Findings
-
[TPR-01-001-codex][medium]diagnostics/debug-release-compare.sh:117— Return exit code 2 when either build fails. Evidence: DRIFT — script header defines exit 1=mismatch, 2=infra error, but compile-failure branches used exit 1. Fresh verification confirmed. Resolved: Fixed on 2026-04-09 in commit 337411e7. Both compile-failure branches now exit 2. -
[TPR-01-002-codex][low]diagnostics/self-test.sh:255— Exercise the infrastructure-error paths in self-test. Evidence: GAP — self-test only checked matching fixtures, —help, no-args. No compile-fail test existed. Resolved: Fixed on 2026-04-09 in commit 337411e7. Added run_test_exit_code helper and compile-failure test. -
[TPR-01-003-codex][low]plans/diagnostic-tooling-improvements/section-01-aims-compare.md:35— Synchronize plan status surfaces for Section 01 and Section 02. Evidence: DRIFT — Section 01 body said “Not Started” while frontmatter said complete. Overview and index also stale. Resolved: Fixed on 2026-04-09 in commit 337411e7. All plan surfaces synced. -
[TPR-01-001-gemini][informational]diagnostics/debug-release-compare.sh:130— Document omission of stderr comparison. Non-actionable observation. Stderr comparison intentionally omitted (exit code catches panics). -
[TPR-01-002-gemini][informational]diagnostics/_common.sh:65— Address reliance on SCRIPT_DIR convention. Non-actionable observation. SCRIPT_DIR convention is the established pattern for all diagnostic scripts.
01.N Completion Checklist
- All subsections (01.1, 01.2) complete
-
diagnostics/self-test.shpasses (29/29) -
timeout 150 ./test-all.shgreen — no regressions (16,927 passed) - No references to aims-compare remain in active codebase surfaces (
CLAUDE.md,.claude/rules/,diagnostics/all clean) -
/tpr-reviewpassed — independent third-party review clean -
/impl-hygiene-reviewpassed — after TPR is clean -
/improve-toolingsection-close sweep — verify both subsection retrospectives ran; add any cross-subsection patterns