Section 03: Verification

Status: Not Started Goal: Verify the subprocess isolation works correctly end-to-end. Confirm behavioral equivalence with the old in-process path, verify crash isolation, measure performance overhead, and validate test gate integrity.

Success Criteria:

./test-all.sh passes with no CRASHED status
Behavioral equivalence: same counts for non-crashing files
Crash isolation: parent survives worker SIGSEGV
Performance: wall-clock within 2x baseline
Weakened test gate confirmed reverted: no ORI_LLVM_CRASHED variable or exit-0 escape hatch remains in test-all.sh
Debug AND release builds pass
Satisfies all mission success criteria

Context: The subprocess isolation changes how LLVM spec tests are executed — from in-process to per-file subprocesses. This must not change observable results for files that work correctly, must contain crashes for files that don’t, and must not unacceptably slow down the test suite.

Depends on: Section 02 (orchestrator fully operational).

03.1 Behavioral Equivalence

Verify that the subprocess-based runner produces identical results to the old in-process runner for all non-crashing test files.

Approach: Use the --json flag for machine-comparable output. The in-process path is still accessible by directly calling run_file_with_interner() in a Rust test (bypassing the orchestrator).

03.2 Crash Isolation Verification

Verify that worker crashes are contained and correctly reported.

03.3 Performance Measurement

Measure the overhead of subprocess isolation vs in-process execution.

Important context: Each worker process re-parses and re-typechecks its file from scratch. This duplicates work but is necessary for process isolation (no shared memory across process boundary). However, with subprocess isolation, the LLVM Context::create() global lock contention that forced sequential execution (see runner/mod.rs line 116-120 comment) no longer applies — each process has its own LLVM context. Parallelism should largely offset per-file overhead.

Baseline measurement (before wiring orchestrator, or using git stash): Time the in-process sequential LLVM spec test run:
```
time timeout 150 ./target/release/ori test --backend=llvm tests/spec/
```
Record: wall-clock time, total files processed, total tests. Save as plans/llvm-worker-isolation/perf-baseline.txt.

Subprocess sequential: Time with subprocess isolation, sequential:

time timeout 150 ./target/release/ori test --backend=llvm --no-parallel tests/spec/

Subprocess parallel (default): Time with subprocess isolation, default parallelism:
```
time timeout 150 ./target/release/ori test --backend=llvm tests/spec/
```
Overhead analysis: Calculate per-file subprocess overhead:
- Expected: ~10-50ms per file for process spawn + JSON parse
- With ~300 files sequential: ~3-15s total overhead
- With parallelism (N = CPU count): overhead amortized, net speedup if N > 2
Acceptance criteria:
- Sequential: wall-clock within 2x of baseline
- Parallel: wall-clock within 1.5x of baseline (parallelism should offset subprocess overhead)
- If parallel is FASTER than baseline (likely with CPU count > 2), that’s a bonus
If too slow: Profile to identify bottleneck:
1. Process spawn overhead? → measure with time sh -c "for i in $(seq 300); do ./target/release/ori --version; done"
2. JSON parse overhead? → benchmark serde_json::from_str on a typical JsonFileSummary
3. Re-parsing/re-typechecking? → compare single-file in-process vs subprocess time
4. Mitigations (not in this plan, future optimization): batch multiple files per worker, pre-compute data via temp file
/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (03.3) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-03.3 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 03.3: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

03.4 Test Gate Integrity

Verify that the test gate (./test-all.sh) correctly reflects the new subprocess-based execution.

03.R Third Party Review Findings

None.

03.N Completion Checklist

Exit Criteria: ./test-all.sh passes with exit code 0. The LLVM backend summary line shows pass/fail/crash counts (not CRASHED). Worker crashes produce BackendCrash outcomes that block the test gate. Performance overhead is within 2x of baseline. The pre-commit hook (./full-check.sh) passes for .rs file changes. All mission success criteria are met.