Section 06: Test Matrix + Dual-Execution Parity
Status: Not Started
Goal: Integrated verification across the four facets. Per CLAUDE.md §Fix Completeness: matrix tests + semantic pins + debug+release parity + interpreter+LLVM parity + zero leaks. This section is the final gate before close-out.
Context: Per-section tests (§02.3, §03.3, §04.3, §05.3) cover each facet in isolation. This section adds:
- Cross-facet interaction tests (catch bugs that only surface when two facets combine).
- Sweep verification (dual-exec, ORI_CHECK_LEAKS, debug+release parity across the full new-test set).
- Whole-suite regression check.
The four facets together change the mono-collection dispatch path significantly; integration testing catches interaction bugs per-section tests miss.
Reference implementations:
- Rust
rustc_mir_transformtest corpus uses cross-pass test fixtures: a single source file exercises multiple optimizations simultaneously. - Swift
swift/test/SILOptimizer/includes “diagonal” tests covering interactions between SIL passes.
Depends on: §02, §03, §04, §05 (all four facets must have shipped tests before integration testing makes sense).
Intelligence Reconnaissance
Queries planned:
scripts/intel-query.sh --human file-symbols "compiler/ori_llvm/tests/aot" --repo ori— full inventory of AOT test corpus to identify potential cross-facet test surfaces.scripts/intel-query.sh --human file-symbols "compiler_repo/diagnostics" --repo ori— verify dual-exec-verify.sh + ORI_CHECK_LEAKS infrastructure surface.
Results summary (≤500 chars, recorded 2026-05-14) [ori]: TO BE POPULATED at section start by running the queries above. Scaffold authored 2026-05-14 — queries deferred to execution time when the four facet sections’ test surfaces are landed.
06.1 Cross-facet interaction tests
File(s): compiler_repo/compiler/ori_llvm/tests/aot/mono_cross_facet.rs (new file)
Each test exercises ≥2 of the four facets simultaneously:
| Interaction | Tests |
|---|---|
| §02 (import) ∩ §03 (inherent method on generic) | test_imported_inherent_method_box_int — impl<T> Box<T> defined in module A, called as Box<int>.unwrap() from module B |
| §02 (import) ∩ §04 (complex generic arg) | test_imported_method_on_option_list — Option<T>::map defined in module A, called as Option<[int]>.map(...) from module B |
| §02 (import) ∩ §05 (builtin Apply) | test_imported_generic_with_cast — generic function in module A uses xs.len() as int internally, called from module B |
| §03 ∩ §04 | test_inherent_method_on_complex_generic — impl<T> Wrapper<T> on Wrapper<[int]> |
| §03 ∩ §05 | test_inherent_method_using_cast — impl<T> Box<T> { @len_as_int (self) -> int = self.inner.len() as int } |
| §04 ∩ §05 | test_option_complex_arg_with_cast — Option<[int]>.unwrap().len() as int |
| §02 ∩ §03 ∩ §04 | test_imported_inherent_method_on_complex_generic — full diagonal |
| All four facets | test_full_diagonal_imported_inherent_complex_cast |
- Author all 8 cross-facet tests.
- Each test pins both Ok behavior AND a negative pin (would fail if any of the cited sections’ fix were reverted).
- Subsection close-out (06.1) — MANDATORY before §06.2:
- All 8 cross-facet tests pass.
- Run
/improve-toolingretrospectively on §06.1. - Run
compiler_repo/diagnostics/repo-hygiene.sh --check.
06.2 Dual-execution parity sweep
File(s): N/A (read-only verification)
Per CLAUDE.md §Fix Completeness: every new or modified test MUST have interpreter+LLVM parity.
- Enumerate every test added by §02.4 (4 + 1 negative), §03.3 (6 + 1), §04.3 (9 + 1), §05.3 (7 + 1), §06.1 (8). Total = 38 tests.
- Run
timeout 150 diagnostics/dual-exec-verify.sh --json | jq '.per_test[] | select(.parity_status != "match")'. Result MUST be empty. - If any test reports parity divergence: STOP. The divergence is a fix-completeness bug per
CLAUDE.md §Fix Completeness. Pull into scope, fix at the affected backend (eval or LLVM), re-run. - Subsection close-out (06.2) — MANDATORY before §06.3:
- dual-exec-verify.sh empty divergence list.
- If any divergences surfaced and got fixed: HISTORY block in this section’s body documents the divergence + cure + commit SHA.
- Run
/improve-toolingretrospectively on §06.2. - Run
compiler_repo/diagnostics/repo-hygiene.sh --check.
06.3 Leak verification under ORI_CHECK_LEAKS
File(s): N/A (read-only verification)
- Run each new test with
ORI_CHECK_LEAKS=1set:ORI_CHECK_LEAKS=1 cargo test --release -p ori_llvm --test aot --no-fail-fast - Parse the output for
LEAK detectedmarkers; MUST be zero. - If any leak surfaces: STOP. Pull into scope, fix the leaking allocation site, re-run. New mono-instance registration paths could create new alloc/dealloc imbalances if §02/§03/§04/§05’s emission changes mishandle RC.
- Subsection close-out (06.3) — MANDATORY before §06.4:
- Zero leaks reported.
- Run
/improve-toolingretrospectively on §06.3. - Run
compiler_repo/diagnostics/repo-hygiene.sh --check.
06.4 Whole-suite regression check
File(s): N/A (read-only verification)
- Run
timeout 150 ./test-all.sh— whole-suite green. - Run
timeout 150 ./clippy-all.sh— zero new warnings. - Run
timeout 150 cargo st— spec tests green. - Verify no
#[ignore]annotations introduced anywhere incompiler_repo/compiler/ori_llvm/tests/aot/by this plan (thetest_generic_method_on_generic_typeun-ignore from §03.2 should be the only delta). - Verify no
#skipannotations introduced incompiler_repo/tests/spec/related to this plan’s work. -
state.sh refresh --dispositions-only && state.sh show --json | jq '.test_dispositions.totals.untracked'returns 0. - Subsection close-out (06.4) —
status: complete:- All checks above pass.
- Run
/improve-toolingretrospectively on §06.4. - Run
compiler_repo/diagnostics/repo-hygiene.sh --check.
06.R Third Party Review Findings
Populated by /tpr-review at §06.N (this section’s TPR catches integration bugs across §02–§05).
06.N Completion Checklist
- All 06.1, 06.2, 06.3, 06.4 subsections
status: complete. -
/tpr-reviewclean across reviewer set. -
/impl-hygiene-reviewclean after TPR. -
python -m scripts.plan_corpus check plans/aot-mono-completeness/section-06-test-matrix.mdexit 0. - Section frontmatter flipped to
status: complete,reviewed: true.