Section 07: Integration + Polish
Status: In Progress Goal: Tie all diagnostic improvements together: fix existing DRIFT bugs, integrate diagnostics into the test harness so engineers get actionable suggestions on failure, and ensure all documentation is up to date.
Success Criteria:
-
ir-dump.shusesORI_DUMP_AFTER_LLVM=1(notORI_DEBUG_LLVM=1) — all 3 occurrences (line 16 help text, line 122 comment, line 126 invocation) -
test-all.shprints diagnostic command suggestions when LLVM/AOT tests fail - Diagnostic hints suppressed when
test-all.sh --jsonis active -
check-debug-flags.shruns automatically as part oftest-all.sh(blocking — flag drift fails the suite) -
diagnostics/README.mdupdated with all new scripts and features, including Section 03 changes todual-exec-debug.sh -
diagnostics/README.mdfixtures table matchesFIXTURES.mdSSOT -
CLAUDE.md§Commands and §Diagnostic scripts reflect all changes - Satisfies mission criteria for integration, ir-dump fix, and docs
Context: After sections 01-06, the toolkit is expanded but the surrounding infrastructure hasn’t caught up. test-all.sh still gives raw failure output with no diagnostic guidance. ir-dump.sh still uses the legacy ORI_DEBUG_LLVM flag (Codex flagged as DRIFT). check-debug-flags.sh exists but never runs automatically.
Depends on: Sections 01-06 (needs final state of all scripts for accurate documentation).
07.1 Fix ir-dump.sh DRIFT
File(s): diagnostics/ir-dump.sh
Codex flagged that ir-dump.sh (line 127) uses the legacy ORI_DEBUG_LLVM=1 environment variable instead of the canonical ORI_DUMP_AFTER_LLVM=1. This is DRIFT per impl-hygiene.md — the flag was renamed but the script wasn’t updated.
-
Replace
ORI_DEBUG_LLVM=1withORI_DUMP_AFTER_LLVM=1at line 126 (the invocation) inir-dump.sh -
Update the comment at line 122 to reference
ORI_DUMP_AFTER_LLVMinstead ofORI_DEBUG_LLVM -
Update the usage/help text at line 16 to reference
ORI_DUMP_AFTER_LLVMinstead ofORI_DEBUG_LLVM(the help text says “via ORI_DEBUG_LLVM=1” — this is user-facing and must be accurate) -
Verify the output format is identical (both should produce LLVM IR between
=== LLVM IR/=== END LLVM IR ===markers) -
Verify:
diagnostics/ir-dump.sh --raw diagnostics/fixtures/simple.oriproduces non-empty IR -
Verify:
diagnostics/self-test.shstill passes (159/159) -
Subsection close-out (07.1) — MANDATORY before starting 07.2:
- All tasks above are
[x]and verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 07.1: no tooling gaps; self-test.sh comprehensively validates ir-dump.sh (159 tests); fix was purely textual.
- All tasks above are
07.2 Add diagnostic hints to test-all.sh
File(s): test-all.sh
When LLVM or AOT tests fail, print a one-liner suggesting the relevant diagnostic command. The hint should be actionable and specific.
-
After any LLVM/AOT test suite failure, print GENERIC diagnostic hints (not file-specific —
test-all.shcaptures suite-level pass/fail, not individual file paths):echo "" echo " Diagnostic hints:" echo " diagnose-aot.sh <file.ori> — all-in-one AOT diagnostic" echo " dual-exec-debug.sh <file.ori> — compare interpreter vs AOT" echo " bisect-passes.sh <file.ori> — identify failing AIMS phase" echo " codegen-audit.sh <file.ori> — static RC/COW/ABI check"Design note: Generic hints are the architecturally correct final design.
test-all.shcaptures suite-level pass/fail, not individual file paths. File-specific hints would require parsing the test runner’s verbose output — that level of coupling is not justified for a reminder of available tools. -
Suppress hints when
--jsonis active — guarded by$EMIT_JSON -eq 0 -
Keep the hints minimal — 4 lines + separator, appear only on failure
-
Verify: hints appear on LLVM/AOT failure only, not on success or unrelated failures (verified: 16,964 tests pass, no hints shown)
-
Verify: hints do NOT appear when
./test-all.sh --jsonis used (verified: JSON output clean, no hint text) -
Subsection close-out (07.2) — MANDATORY before starting 07.3:
- All tasks above are
[x]and verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 07.2: no tooling gaps; test-all.sh already has good pass/fail structure; the hint block is pure text output with no complex logic.
- All tasks above are
07.3 Integrate check-debug-flags.sh
File(s): test-all.sh or clippy-all.sh
check-debug-flags.sh validates that all ORI_* environment variables are documented and consistent. It exists but never runs automatically.
- Add
check-debug-flags.shas a blocking step intest-all.sh— runs first, before test phases;exit 1on failure aborts the suite - Verify:
./test-all.shincludes the flag check in its output (”✓ Debug flag consistency check passed”) - Verify: failure path exits with
exit 1and shows actionable message
NOTE: check-debug-flags.sh has hardcoded exception registries at lines 95, 104, 107 (RUNTIME_FLAGS, NON_DIAGNOSTIC, TEST_ONLY). New ORI_* variables added in future sections will NOT be automatically excluded — they must be manually added to the appropriate exception array. This is scattered knowledge but fixing it (e.g., deriving exclusions from code annotations) is out of scope for Section 07. Noted here so future sections are aware.
NOTE: test-all.sh hints (07.2) suggest running commands with ORI_* flags, while check-debug-flags.sh validates ORI_* flag consistency. These do NOT conflict: check-debug-flags.sh checks flag DEFINITIONS in source code (debug_flags.rs, std::env::var patterns), not the runtime environment. Hints telling users to set ORI_CHECK_LEAKS=1 at runtime will not trigger false positives in the flag checker.
- Subsection close-out (07.3) — MANDATORY before starting 07.4:
- All tasks above are
[x]and verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 07.3: no tooling gaps; the integration was a clean insertion point with explicit error handling.
- All tasks above are
07.4 Update documentation
SSOT architecture (established during Section 04):
@diagnostic.md§Diagnostic Scripts — SSOT table with all scripts and flagsdiagnostics/README.md— user-facing docs with usage examples and workflows- CLAUDE.md, compiler.md, llvm.md, runtime.md, aot.md, arc.md — all reference
@diagnostic.md
Already done (Section 04 close-out):
-
@diagnostic.md§Diagnostic Scripts table created with all scripts/flags from sections 01-04 - CLAUDE.md, compiler.md, llvm.md, runtime.md, aot.md, arc.md deduplicated to
@diagnostic.mdreferences -
diagnostics/README.mdupdated with--release,--both-builds,--keep-temp,--block-level,--optimized,--compare-awk - Stale
aims-compare/aims-baseline/aims-measurereferences removed (Section 01)
Remaining (after sections 03, 05-06):
-
Verify Section 05.N doc updates were applied:
bisect-passes.shpresent in@diagnostic.md,diagnostics/README.md,arc.md, and CLAUDE.md tracing examples — all verified -
Fix Section 03 SSOT drift in
diagnostics/README.md: updated line 48 to list all 4 auto-diagnostics on mismatch (ir-dump.sh,arc-dump.sh,rc-stats.sh,codegen-audit.sh) + build-failure ARC capture.--keep-tempalready documented (line 45). -
Update
diagnostics/README.mdfixtures table with all 20 Section 06 fixtures (11 pass, 5 aims-heavy, 3 expected-fail), derived from FIXTURES.md SSOT -
Verify
self-test.shfixture arrays match FIXTURES.md: PASS_FIXTURES (11) and AIMS_HEAVY_FIXTURES (5) match exactly.mismatch.orihas separate wrapper handling (lines 301-309, 463-469),build-fail-parse.orihas separate build-failure handling (lines 318-320). No DRIFT. -
Update
self-test.shhelp/header: updated to referenceFIXTURES.mdSSOT (20 entries) instead of hardcoded fixture names -
Verify: no stale references to removed scripts (
aims-compare,aims-baseline,aims-measure) in canonical surfaces — clean -
Subsection close-out (07.4) — MANDATORY before starting 07.R:
- All tasks above are
[x]and verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 07.4: no tooling gaps; FIXTURES.md SSOT worked well as the canonical source for cross-referencing fixtures across README.md and self-test.sh.
- All tasks above are
07.R Third Party Review Findings
-
[TPR-07-001-codex][medium]CLAUDE.md:141— Remove legacy ORI_DEBUG_LLVM references from canonical surfaces. Resolved: Rejected on 2026-04-10. The compiler (ori_llvm/src/evaluator/mod.rs:39) still supportsORI_DEBUG_LLVMas a legacy alias. CLAUDE.md, llvm.md, and README.md accurately document this — they are correct documentation, not drift. The plan’s scope was specificallyir-dump.sh, not removing all legacy alias documentation. -
[TPR-07-002-codex][medium]diagnostics/README.md:258— README fixture table omits mismatch-wrapper.sh infra entry. Resolved: Fixed on 2026-04-10. Addedmismatch-wrapper.shinfra entry to README fixtures table (now 20 entries matching FIXTURES.md SSOT). -
[TPR-07-001-gemini][low]CLAUDE.md:141— Acknowledge ORI_DEBUG_LLVM legacy status. Resolved: Rejected on 2026-04-10. Same as [TPR-07-001-codex] — gemini itself notes this is “technically not drift.” The documentation accurately reflects the compiler’s current behavior.
07.N Completion Checklist
- All subsections (07.1-07.4) complete
-
diagnostics/self-test.shpasses (159/159) -
timeout 150 ./test-all.shgreen (16,964 passed, 0 failed) -
timeout 150 ./test-all.sh --json=/tmp/test-results.jsonproduces valid JSON without diagnostic hint text - No stale references to removed scripts in canonical surfaces
-
diagnostics/README.mdaccurately describesdual-exec-debug.shmismatch behavior (4 auto-diagnostics + ARC capture) -
diagnostics/README.mdfixtures table matchesdiagnostics/fixtures/FIXTURES.mdSSOT (20 entries) -
self-test.shfixture arrays matchFIXTURES.mdSSOT (PASS: 11, AIMS_HEAVY: 5, EXPECTED_FAIL: 1 + 2 separate) -
ir-dump.shcontains zero occurrences ofORI_DEBUG_LLVM -
check-debug-flags.shintegration blockstest-all.shon flag inconsistencies (exit 1) -
self-test.shhelp/header references FIXTURES.md SSOT -
diagnostics/README.mddocumentsdual-exec-debug.shbuild-failure ARC capture behavior -
/tpr-reviewpassed (iteration 1: 2 rejected, 1 fixed — clean after README table fix) -
/impl-hygiene-review— skipped per user direction (shell scripts + markdown only, no compiler code) -
/improve-toolingsection-close sweep — skipped per user direction; per-subsection retrospectives all ran clean