100%

Section 06: Expand Fixtures + Self-Test

Status: Not Started Goal: The diagnostic toolkit’s self-test suite runs against only 3 basic fixtures (simple.ori, clean.ori, chain.ori). These don’t exercise closures, iterators, nested structures, generics, trait dispatch, or failure modes — the exact code patterns that cause the most debugging churn. New fixtures ensure diagnostic scripts produce correct output for the patterns they’ll actually be used to debug. The fixture suite must also cover escape closures, ? unwinding, recursive tree walks, COW sharing, large aggregates, and mixed sum types — all identified as blind spots by tp-help consensus.

Success Criteria:

  • At least 13 new .ori fixture files in diagnostics/fixtures/ — 14 new fixtures created
  • Each fixture exercises a distinct code pattern relevant to AOT/AIMS debugging
  • Fixtures categorized as pass (exit 0, clean RC), aims-heavy (exit 0, exercises AIMS-specific paths like COW/reuse), or expected-fail (exit non-zero, validates diagnostic detection)
  • self-test.sh runs new fixtures through diagnose-aot.sh, dual-exec-debug.sh, rc-stats.sh, ir-dump.sh, arc-dump.sh, and bisect-passes.sh (added by Section 05)
  • bisect-passes.sh exercised on at minimum closure.ori, iterator_break.ori, and generic_mono.ori (the AIMS-relevant fixtures) — all 13 pass/aims-heavy fixtures run through bisect-passes
  • Self-test assertions are feature-specific — not just “non-empty output” but assertions on expected IR markers (e.g., PartialApply for closures, Switch for match, RcInc/RcDec for RC-heavy fixtures)
  • Expected-fail fixtures use run_test_expect_fail with explicit exit code assertions distinguishing leak vs crash vs mismatch
  • All fixtures verified under both debug and release builds (cargo b and cargo b --release)
  • Satisfies mission criterion: “7+ new diagnostic fixtures covering closures, iterators, nested structures, generics, trait dispatch, and failure modes”

Context: The current 3 fixtures (simple.ori — no collections/RC; clean.ori — collections, balanced RC; chain.ori — chained COW) were adequate when the toolkit was first built. But ARC/AIMS bugs predominantly appear in closure captures, iterator early-exit cleanup, nested aggregate drops, generic instantiation, and trait method dispatch — none of which are exercised. A diagnostic regression in these areas ships behind a green self-test.

Depends on: Section 05 (bisect-passes.sh must exist for self-test integration).

README ownership: Section 07 owns the diagnostics/README.md fixtures table update (see section-07-integration.md 07.4). This section creates the fixtures and the FIXTURES.md categorization file; Section 07 integrates the final table into the user-facing README.


06.1 Create core-pattern fixtures

File(s): diagnostics/fixtures/*.ori (new files)

Each fixture must: (1) compile under AOT, (2) produce deterministic output via exit code (0 = success, 1 = logic failure), (3) exercise a specific code pattern, (4) pass both ori run and AOT binary execution with identical results. Fixture names are descriptive of the pattern, not the section number. Reference existing test files in tests/valgrind/fat_matrix/ for correct Ori syntax patterns.

Category: pass — all exit 0, balanced RC.

  • closure.ori — Closure capturing a collection ([int]), calling the closure, verifying captures are alive after the call. Tests closure RC: the captured value must be inc’d on capture, dec’d on closure drop. Must also include: closure passed as function argument, closure called twice (RC balance after multiple invocations). Reference syntax: tests/valgrind/fat_matrix/f04_closure_capture.ori

  • closure_escape.ori — Closures that escape their creation scope: stored in a list, passed as a parameter to another function, returned from a function, and called after the creating scope has exited. This is a GAP identified by tp-help — capture-only coverage is insufficient for RC correctness because escaping closures stress the lifetime of captured values beyond lexical scope. Reference syntax: tests/valgrind/fat_matrix/f04_closure_capture.ori (for capture patterns), tests/spec/expressions/lambdas.ori (for lambda syntax)

  • iterator_break.ori — Iterate over [str] with early break, verifying the iterator and remaining elements are properly dropped. This is the #1 ARC debugging pain point. Must include: full iteration (no break), break on first element, break on middle element, continue skipping elements. Reference syntax: tests/valgrind/fat_matrix/f19_break_continue.ori

  • iterator_complex.ori — Iterator patterns beyond simple break: nested for loops with fat values in both levels, for...yield with break producing partial collection, continue with guard filtering, map iteration and cleanup. tp-help identified single iterator_break.ori as insufficient — iterator coverage must be deeper. Reference syntax: tests/valgrind/fat_matrix/f19_break_continue.ori, tests/spec/traits/iterator/for_loop.ori

  • nested_list.ori — Nested [[str]] collection, exercising elem_dec_fn propagation for nested drops. Include: creating nested lists, accessing inner elements, passing nested lists to functions. Reference syntax: tests/valgrind/fat_matrix/f14_list_element.ori

  • trait_dispatch.ori — Trait method call through a concrete impl Trait for Type (current compiler syntax), testing that trait dispatch codegen produces balanced RC. Include: trait with required method, trait with default method, calling trait method on a value that owns fat pointers. Note: current compiler uses impl Trait for Type syntax (not impl Type: Trait — that’s approved but not yet implemented per CLAUDE.md). Reference syntax: tests/spec/traits/declaration.ori

  • pattern_match.ori — Sum type with 3+ variants including mixed scalar and fat-pointer payloads (e.g., A(x: int) | B(s: str) | C(xs: [int])), exercising tag dispatch and per-variant drops. tp-help identified this as a gap: mixed scalar/ref variants stress the decision tree codegen differently than uniform variants. Reference syntax: tests/valgrind/fat_matrix/f06_pattern_matching.ori, tests/valgrind/fat_matrix/f12_sum_payload.ori

  • map_iteration.ori — Map creation with string keys, iteration over entries, map lookup, verifying RC for both keys and values during iteration. Reference syntax: tests/valgrind/iter_rc/map_str_iteration.ori, tests/valgrind/iter_rc/map_str_for_do.ori (active executable map examples; NOT tests/spec/types/map_types.ori which is a disabled TODO corpus)

  • Verify each fixture: cargo run -- run <fixture> produces expected exit code, cargo run -- build <fixture> -o /tmp/test_fixture && /tmp/test_fixture produces the same exit code

  • Subsection close-out (06.1) — MANDATORY before starting 06.2:

    • All tasks above are [x] and verified
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection

06.2 Create ARC-interaction fixtures

File(s): diagnostics/fixtures/*.ori (new files)

These fixtures exercise ARC-specific interaction patterns that tp-help identified as blind spots. They are pass fixtures (exit 0) but are categorized as aims-heavy because they specifically stress AIMS pipeline phases.

Category: aims-heavy — all exit 0, but exercise AIMS-specific paths (COW, reuse, ? unwinding, recursion).

  • question_mark.ori? operator propagation with fat values in scope (heap str, [int], struct-with-fat-field). Must include: ? on Option<str> returning None, ? on Option<[int]> returning Some, chained ? with multiple fat locals in scope that must be cleaned up on early exit. tp-help identified this as mandatory ARC interaction coverage — ? triggers early-exit unwinding that must drop all live fat values. Reference syntax: tests/valgrind/fat_matrix/f15_question_mark.ori

  • recursive_tree.ori — Recursive function passing fat pointer types through recursive call frames: heap str through N levels, [int] through recursion, struct with fat field returned from recursive base case. Exercises stack-frame RC correctness across recursive depth. Reference syntax: tests/valgrind/fat_matrix/f16_recursion.ori

  • generic_mono.ori — Generic function instantiated with multiple concrete types: scalar (int), heap string (str), list ([int]), and struct-with-fat-field. tp-help identified single-type generic coverage as insufficient — monomorphization must be tested across the type matrix to verify RC analysis is correct for each instantiation. Reference syntax: tests/valgrind/fat_matrix/f10_generics.ori

  • large_aggregate.ori — Struct with 3+ int fields (>16 bytes) passed to and returned from functions, exercising ABI compliance for large aggregates. Must verify that pass-by-reference codegen does not trigger unnecessary RC operations. Catches FastISel vs full pipeline regressions. Reference syntax: tests/valgrind/fat_matrix/f10_generics.ori (for struct patterns)

  • cow_sharing.ori — COW sharing barrier exercise: create a list, alias it (shared), mutate through one alias (triggers COW clone), verify original is unchanged. Also: multi-fork (3+ references to same backing), and push-after-share on both sides. Exercises is_unique check and COW clone path. Reference syntax: tests/valgrind/cow/cow_list_push.ori

  • Verify each fixture: cargo run -- run <fixture> and cargo run -- build <fixture> -o /tmp/test_fixture && /tmp/test_fixture produce identical exit code 0

  • Verify each fixture under release build: cargo run --release -- build <fixture> -o /tmp/test_fixture && /tmp/test_fixture produces exit code 0

  • Subsection close-out (06.2) — MANDATORY before starting 06.3:

    • All tasks above are [x] and verified
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection

06.3 Create expected-fail fixtures

File(s): diagnostics/fixtures/*.ori (new files)

tp-help identified that failure fixtures were “optional and underspecified” — this is a coverage gap. Diagnostic scripts must be validated in failure mode, not just success mode. These fixtures are mandatory.

Category: expected-fail — designed to trigger specific diagnostic failures.

  • leak.ori — Program that intentionally leaks an RC value (e.g., create a circular reference or allocate without drop path). ORI_CHECK_LEAKS=1 must report a leak. diagnose-aot.sh must detect the leak. This validates that the leak detection path in diagnostic scripts actually works.

    • Safe Ori code cannot create true RC leaks (no circular references, ARC manages all allocations). Created best-effort fixture: panic with fat values in scope causes diagnose-aot.sh to report FAIL (execution exit=1) + WARN (RC Stats imbalanced: over-releases from incomplete cleanup). ORI_CHECK_LEAKS=1 does not report leaks because the panic handler bypasses ori_run_main’s return path where the leak check runs.
  • mismatch_compute.ori — Program that (via the mismatch-wrapper.sh infrastructure already in diagnostics/fixtures/) produces different interpreter vs AOT output. This validates that dual-exec-debug.sh correctly detects and reports mismatches with auto-diagnostic output. Note: The existing mismatch.ori + mismatch-wrapper.sh already serves this purpose — verify it is sufficient or extend it.

    • Verified: existing mismatch.ori + mismatch-wrapper.sh is sufficient. ORI_BIN=mismatch-wrapper.sh dual-exec-debug.sh mismatch.ori correctly detects MISMATCH (stdout “INTERP” vs “AOT”), exits 1, and produces auto-diagnostic output. No separate mismatch_compute.ori needed.
  • Subsection close-out (06.3) — MANDATORY before starting 06.4:

    • All tasks above are [x] and verified
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection

06.4 Fixture matrix and categorization

File(s): diagnostics/fixtures/FIXTURES.md (new file)

tp-help identified scattered fixture knowledge as a LEAK — fixture names are repeated per-script in self-test with no single source of truth for what each fixture covers. This subsection creates the SSOT.

  • Create diagnostics/fixtures/FIXTURES.md with a categorization table: Created with 18 fixtures (11 pass, 5 aims-heavy, 2 expected-fail) plus build-fail-parse.ori and mismatch-wrapper.sh infra entries. Includes full matrix table matching the plan specification. Also added infra category for supporting infrastructure files.

  • In FIXTURES.md, document the self-test contract for each category:

    • pass: ir-dump.sh (non-empty), arc-dump.sh (non-empty), diagnose-aot.sh (exit 0), dual-exec-debug.sh (MATCH), rc-stats.sh (produces output), bisect-passes.sh --rc-only (phase table + “Leak check: clean”)
    • aims-heavy: same as pass, PLUS bisect-passes.sh --rc-only shows non-zero RC ops, AND feature-specific IR marker assertions
    • expected-fail: diagnose-aot.sh / dual-exec-debug.sh must report failure, specific exit code + output pattern documented per fixture
  • Subsection close-out (06.4) — MANDATORY before starting 06.5:

    • All tasks above are [x] and verified
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection

06.5 Update self-test.sh coverage

File(s): diagnostics/self-test.sh

  • Update the fixture existence check at the top of self-test.sh (currently checks simple.ori, clean.ori, chain.ori only) to also require all new fixtures. Group by category (pass, aims-heavy, expected-fail) with comments. SSOT note: fixture lists in self-test.sh must be verifiable against diagnostics/fixtures/FIXTURES.md — add a comment referencing FIXTURES.md as the canonical source. If feasible, parse FIXTURES.md to generate the fixture arrays rather than hardcoding them (eliminates the LEAK:scattered-knowledge risk identified by TPR). Updated with PASS_FIXTURES, AIMS_HEAVY_FIXTURES, EXPECTED_FAIL_FIXTURES arrays. Comment references FIXTURES.md.

  • Add each pass fixture to the self-test matrix (ir-dump, arc-dump, diagnose-aot, dual-exec-debug, rc-stats, bisect-passes).

  • Add feature-specific assertions for aims-heavy and select pass fixtures:

    • closure.ori/closure_escape.ori: PartialApply (confirmed 5 occurrences each)
    • pattern_match.ori: Switch (confirmed 6 occurrences)
    • generic_mono.ori: “functions” in arc-dump header (confirmed “12 functions”)
    • question_mark.ori: RcDec (confirmed 23 occurrences)
    • cow_sharing.ori: RcInc (COW uniqueness via RC sharing; IsShared not in ARC IR — runtime-level check)
    • recursive_tree.ori: “functions” in arc-dump header (confirmed “5 functions”)
  • Add aims-heavy fixtures to self-test matrix with standard + feature-specific assertions.

  • Add each expected-fail fixture with specific assertions:

    • leak.ori: diagnose-aot exits non-zero + output contains “imbalance”. bisect-passes detects “exited with code 1” (panic bypasses runtime leak checker — RC_LIVE_COUNT never checked).
    • mismatch.ori (via wrapper): dual-exec exits 1 + output contains “MISMATCH”
  • Handle bisect-passes.sh exit code semantics — do NOT assert exit 0 for pass/aims-heavy; assert “Phase” and “Leak check: clean” in output.

  • Release build coverage — conditional section gated on target/release/ori, runs diagnose-aot.sh --release on closure.ori, iterator_break.ori, generic_mono.ori. SKIP if no release binary.

  • Verify: diagnostics/self-test.sh --verbose passes — 159 passed, 0 failed

  • Verify: all new self-test assertions pass in CI-equivalent conditions (clean build) — confirmed via test-all.sh (16954 passed, 0 failed)

  • Subsection close-out (06.5) — MANDATORY before starting 06.R:

    • All tasks above are [x] and verified
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection

06.R Third Party Review Findings

  • [TPR-06-001-codex][high] section-06-fixtures.md:142 — Centralize fixture categories to remove LEAK and DRIFT. generic_mono.ori inconsistency, self-test.sh as second registry. Resolved: Fixed on 2026-04-10. Moved generic_mono.ori to 06.2 (aims-heavy), added SSOT note to 06.5 fixture list requiring FIXTURES.md cross-reference.
  • [TPR-06-002-codex][medium] section-06-fixtures.md:48 — Add large aggregate coverage promised by the goal. Resolved: Fixed on 2026-04-10. Added large_aggregate.ori fixture to 06.2 with >16B struct pattern and IR assertion.
  • [TPR-06-003-codex][medium] section-06-fixtures.md:200 — Complete expected-fail matrix with exact exit-code assertions. Resolved: Fixed on 2026-04-10. Added mismatch_compute.ori to FIXTURES.md table, replaced generic run_test_expect_fail with specific exit code + output pattern assertions.
  • [TPR-06-001-gemini][medium] section-06-fixtures.md:195 — Add mismatch_compute.ori to FIXTURES.md table. Resolved: Fixed on 2026-04-10. Same fix as [TPR-06-003-codex].
  • [TPR-06-002-gemini][low] section-06-fixtures.md:79 — Harmonize generic_mono.ori categorization. Resolved: Fixed on 2026-04-10. Same fix as [TPR-06-001-codex] — moved to 06.2 aims-heavy.
  • [TPR-06-003-gemini][medium] section-06-fixtures.md:214 — Use —rc-only flag for bisect-passes self-test assertions. Resolved: Fixed on 2026-04-10. Updated 06.5 to specify --rc-only flag and explain why it’s load-bearing.
  • [TPR-06-004-gemini][low] section-06-fixtures.md:180 — Correct bisect-passes coverage for simple.ori in SSOT table. Resolved: Fixed on 2026-04-10. Changed simple.ori bisect-passes from “No (trivial)” to “Yes”.
  • [TPR-06-005-gemini][medium] section-06-fixtures.md:225 — Exercise leak.ori with bisect-passes.sh to verify detection. Resolved: Fixed on 2026-04-10. Added leak.ori to bisect-passes coverage with exit 1 assertion, updated table.

06.N Completion Checklist

  • All subsections (06.1, 06.2, 06.3, 06.4, 06.5) complete
  • All pass/aims-heavy fixtures compile and run under both interpreter and AOT
  • All pass/aims-heavy fixtures produce identical results under debug and release builds
  • Expected-fail fixtures correctly trigger diagnostic detection
  • diagnostics/fixtures/FIXTURES.md exists and is the SSOT for fixture categorization
  • diagnostics/self-test.sh passes with all new fixtures — 159 passed, 0 failed
  • Feature-specific assertions validate real IR markers, not just “non-empty”
  • timeout 150 ./test-all.sh green — 16954 passed, 0 failed
  • /tpr-review passed — waived by user
  • /impl-hygiene-review passed — waived by user
  • /improve-tooling section-close sweep — per-subsection retrospectives covered all gaps; no cross-subsection patterns required new tooling
  • Strip plan annotations — zero annotations found for diagnostic-tooling-improvements plan