Section 14: Testing Framework

Goal: Configurable test enforcement with dependency-aware execution and incremental test execution during compilation

SPEC: spec/19-testing.md DESIGN: design/11-testing/index.md PROPOSALS:

proposals/approved/dependency-aware-testing-proposal.md — Dependency-aware test execution

proposals/approved/incremental-test-execution-proposal.md — Incremental test execution & explicit free-floating tests

proposals/approved/test-execution-model-proposal.md — Consolidated implementation model (data structures, algorithms, cache)

NOTE - Pending Syntax Changes: The approved proposals change attribute syntax:

Attribute syntax: #[skip("reason")] → #skip("reason") (Section 15.1) See Section 15 (Approved Syntax Proposals) for details. Implement with new syntax directly to avoid migration.

14.1 Test Requirement

14.2 Test Declaration

14.3 Test Attributes

14.4 Test Functions

14.5 Assertions

CROSS-REFERENCE: Assertion built-in functions (assert, assert_eq, assert_ne, assert_some, assert_none, assert_ok, assert_err, assert_panics, assert_panics_with) are implemented in Section 7 (Standard Library), section 7.5.

This section focuses on the testing framework (test declarations, dependency tracking, test runner). The assertions themselves are always-available built-in functions from the prelude.

/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (14.5) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-14.5 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 14.5: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

14.6 Test Organization

14.7 Test Execution

14.8 Compile-Fail Tests

14.9 Dependency-Aware Test Execution

PROPOSAL: proposals/approved/dependency-aware-testing-proposal.md

When a function changes, run tests for that function AND tests for all functions that depend on it (callers up the dependency graph). This enables fast, correct incremental testing.

Test Execution Modes

Mode	Command	What Runs
Direct	`ori test --direct`	Tests for changed function only
Closure	`ori test` (default)	Changed + all callers (recursive)
Full	`ori test --full`	All tests in project

14.9.1 Dependency Graph for Tests

14.9.2 Reverse Closure Computation

14.9.3 Execution Modes

14.9.4 Change Detection

14.9.5 Integration Test Handling

Free-floating tests (without tests @target) are integration tests:

14.10 Test Utilities

Identified by comparing Ori’s test framework against Go and Rust test frameworks.

14.10.1 Filesystem Test Support

Go provides t.TempDir() for test isolation. Ori should have similar support.

Implement: test_tempdir() — returns isolated temporary directory, auto-cleaned
- Rust Tests: library/std/testing.rs — tempdir utility
- Ori Tests: tests/spec/testing/tempdir.ori
- LLVM Support: LLVM codegen for test_tempdir
- LLVM Rust Tests: ori_llvm/tests/testing_framework_tests.rs (does not exist yet) — test_tempdir codegen
- AOT Tests: No AOT coverage yet

14.10.2 Environment Test Support

Go provides t.Setenv() for test-scoped environment variables. Ori should support this via capabilities.

Implement: test_setenv(name: str, value: str) — scoped env var, auto-restored
- Rust Tests: library/std/testing.rs — setenv utility
- Ori Tests: tests/spec/testing/setenv.ori
- LLVM Support: LLVM codegen for test_setenv
- LLVM Rust Tests: ori_llvm/tests/testing_framework_tests.rs (does not exist yet) — test_setenv codegen
- AOT Tests: No AOT coverage yet

14.10.3 Test Cleanup Hooks

Go provides t.Cleanup() for registering cleanup functions. Ori can leverage capabilities and with pattern.

Design: Cleanup hooks via with pattern or explicit registration
- Rust Tests: library/std/testing.rs — cleanup hooks
- Ori Tests: tests/spec/testing/cleanup.ori
- LLVM Support: LLVM codegen for cleanup hooks
- LLVM Rust Tests: ori_llvm/tests/testing_framework_tests.rs (does not exist yet) — cleanup hooks codegen
- AOT Tests: No AOT coverage yet

14.10.4 Helper Function Support

Go provides t.Helper() to mark functions as test helpers (improves stack traces).

14.11 Incremental Test Execution

PROPOSAL: proposals/approved/incremental-test-execution-proposal.md

During compilation, targeted tests whose targets (or transitive dependencies) have changed are automatically executed. Free-floating tests (tests _) run only via explicit ori test.

EXISTING INFRA (verified 2026-03-29): --incremental CLI flag exists in main.rs; FunctionChangeMap and hash-based change detection already work in compiler/oric/src/test/change_detection/mod.rs with 11 passing unit tests. The --only-targeted flag exists as --only-attached in current CLI (ori test --only-attached). Full compilation-integrated test running (items below) is not yet wired up.

14.11.1 Compilation-Integrated Test Running

14.11.2 CLI Integration

Command	Behavior
`ori check`	Compile + run affected targeted tests
`ori check --no-test`	Compile only, skip tests
`ori check --strict`	Fail build on test failure (for CI)
`ori test`	Run all tests (targeted + free-floating)
`ori test --only-targeted`	Run only targeted tests

14.11.3 Test Result Caching

Implement: Hash-based test caching
- Track hash of each function’s normalized AST
- Cache test results keyed by dependency hashes
- Skip tests when inputs unchanged
- Rust Tests: oric/src/test/cache.rs — caching tests
- Ori Tests: tests/spec/testing/result_caching.ori
- LLVM Support: LLVM codegen for hash-based test caching
- LLVM Rust Tests: ori_llvm/tests/testing_framework_tests.rs (does not exist yet) — test caching codegen

14.11.4 Performance Warnings

Implement: Slow targeted test warning
- Configurable threshold (default 100ms)
- Warning suggests tests _ for slow tests
- Rust Tests: oric/src/commands/test.rs — slow test warning
- Ori Tests: tests/spec/testing/slow_warning.ori
- LLVM Support: LLVM codegen for slow test warning
- LLVM Rust Tests: ori_llvm/tests/testing_framework_tests.rs (does not exist yet) — slow test warning codegen
- AOT Tests: No AOT coverage yet

Example warning:

warning: targeted test @test_parse took 250ms
  --> src/parser.ori:45
  |
  | Targeted tests run during compilation.
  | Consider making this a free-floating test: tests _
  |
  = hint: targeted tests should complete in <100ms

/tpr-review passed — independent review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — hygiene review clean. MUST run AFTER /tpr-review is clean.
Subsection close-out (14.11) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-14.11 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 14.11: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

14.12 Test Execution Model Implementation

PROPOSAL: proposals/approved/test-execution-model-proposal.md

This section consolidates the implementation details from the Test Execution Model proposal, which unifies the dependency-aware and incremental test execution proposals.

14.12.1 Test Registry Data Structure

The TestRegistry tracks test-to-function relationships and caller graphs.

Implement: TestRegistry struct [partial] (verified 2026-03-29)
- Existing infra: TestTargetIndex in change_detection/mod.rs provides tests_for_changed() mapping (function->tests), skippable_tests() logic, and floating test detection — essentially the registry described here
- tests_for: HashMap<FunctionId, Vec<TestId>> — function → tests targeting it (partially via TestTargetIndex)
- callers: HashMap<FunctionId, HashSet<FunctionId>> — function → functions that call it (not yet implemented)
- free_floating: HashSet<TestId> — tests with tests _ (partially via floating test detection)
- Rust Tests: oric/src/test/registry.rs — registry data structure (11 existing tests in change_detection/tests.rs cover the partial implementation)
- Ori Tests: tests/spec/testing/registry.ori (directory does not exist yet)

14.12.2 Content Hashing

Content hashing determines when functions have changed.

Implement: Content hash computation [partial] (verified 2026-03-29)
- Existing infra: FunctionChangeMap::from_canon() computes per-function hashes via hash_canonical_subtree; tests body_change_detected, new_function_detected_as_changed, no_changes_detected_for_identical_canons verify correctness
- Hash function body AST (normalized: whitespace and comments stripped, source structure preserved) — partially done via canonical subtree hashing
- Include parameter types and names — not yet verified
- Include return type, capability requirements, generic constraints — not yet verified
- Rust Tests: oric/src/test/content_hash.rs — hash computation (existing tests in change_detection/tests.rs)
- Ori Tests: tests/spec/testing/content_hash.ori (directory does not exist yet)

14.12.3 Cache Storage and Maintenance

Test results are cached for incremental builds.

14.12.4 `--clean` Flag Behavior

14.13 Test Pass History Cache

Record the last-passing git commit and timestamp for every test. When a test fails, display regression context: which commit it last passed on and when. This is a diagnostic aid — distinct from 14.11/14.12 which are performance optimizations (skip/reorder). No other test runner does this.

Motivation: When a test fails, the developer’s first question is “when did this break?” Today, the answer requires git bisect — a manual, slow process. With pass history, the test runner answers instantly: “last passed on 46873fe (2026-03-20, 3 commits ago)”. Combined with Ori’s @target annotations, it can even show per-function regression context.

14.13.1 Cache Data Model

14.13.2 Cache File Format

14.13.3 Git Integration

Implement: current_git_commit() -> Option<String> — query git for short HEAD SHA
- Run git rev-parse --short HEAD via std::process::Command
- Return None if not in a git repo or git not installed (graceful degradation)
- Cache the result for the duration of the test run (single subprocess call)
- Rust Tests: oric/src/test/pass_history/tests.rs — git query (integration test, #[ignore] if no git)

14.13.4 TestRunner Integration

14.13.5 Failure Output Enhancement

14.13.6 Cache Maintenance

Verification Gaps (identified 2026-03-29)

The following gaps were identified during independent verification:

GAP-14-001: No dedicated Rust unit tests for test enforcement logic — check_test_coverage() and TestEnforcement enum have zero dedicated tests; only a diagnostic rendering test touches E3010. Missing: Off/Warn/Error severity mapping, @main exclusion, empty module edge case, mixed tested/untested module.
GAP-14-002: Plan understated implemented infrastructure — sections 14.9, 14.11, 14.12 were marked not-started but have significant existing infrastructure (FunctionChangeMap, TestTargetIndex, --incremental CLI flag, 11 passing change detection tests). Status corrected to in-progress above.
GAP-14-003: Test count was stale — plan said “900+” but actual count is 4181 passed, 42 skipped. Corrected above.
GAP-14-004: @main exemption was implemented but plan marked full exemption item as todo — check_test_coverage() already excludes @main. Split into separate done/@main and todo/private-helpers items above.
GAP-14-005: No tests/spec/testing/ directory exists — plan references 20+ Ori spec test files under this directory. All existing testing-related Ori tests are in tests/spec/declarations/attributes.ori and tests/spec/source/file_structure.ori. This directory must be created when implementing remaining section 14 items.
GAP-14-006: No ori_llvm/tests/testing_framework_tests.rs file exists — plan references this file across 30+ items. LLVM coverage comes through the integration test runner with --backend=llvm. Annotations added above to all references. Create this file when LLVM-specific items are implemented, or consolidate into existing integration path.
GAP-14-007: LLVM test sub-items systematically unchecked on done items — every done item has unchecked LLVM sub-items, but LLVM test execution works end-to-end via compile_tests(), LlvmBackend, and --backend=llvm. The LLVM runner handles #skip, #compile_fail, and test execution through its backend. Sub-items clarified above.

14.14 Section Completion Checklist

All items in 14.1-14.13 have all three checkboxes marked [ ]
Spec updated: spec/19-testing.md reflects implementation
CLAUDE.md updated if syntax/behavior changed
Re-evaluate against docs/compiler-design/v2/02-design-principles.md
80+% test coverage, tests against spec/design
Run full test suite: ./test-all.sh
/tpr-review passed — independent Codex review found no critical or major issues (or all findings triaged)
/impl-hygiene-review passed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER /tpr-review is clean.
/improve-tooling retrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (which diagnostics/ scripts you ran, which command sequences you repeated, where you added ad-hoc dbg!/tracing calls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE /commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See .claude/skills/improve-tooling/SKILL.md “Retrospective Mode” for the full protocol.

Exit Criteria: Tests are mandatory, dependency-aware, and run correctly

Subsection close-out (14.14) — MANDATORY before starting the next subsection. Run /improve-tooling retrospectively on THIS subsection’s debugging journey (per .claude/skills/improve-tooling/SKILL.md “Per-Subsection Workflow”): which diagnostics/ scripts you ran, where you added dbg!/tracing calls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE /commit-push using a valid conventional-commit type (build(diagnostics): ... — surfaced by section-14.14 retrospective — build/test/chore/ci/docs are valid; tools(...) is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 14.14: no tooling gaps”. Update this subsection’s status in section frontmatter to complete.
/sync-claude section-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.
Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.