Proposal: Test Execution Model

Status: Approved Author: Eric (with Claude) Created: 2026-01-29 Approved: 2026-01-29 Affects: compiler/oric/, compiler/ori_ir/, CLI interface, .ori/cache/

NOTE: This document predates configurable test enforcement (2026-03-05). References to “mandatory testing” describe the test-enforcement = "error" mode, which is no longer the default.

Summary

Define the complete test execution model for Ori: when tests run, which tests run, and how the compiler integrates test execution into the build process. This proposal consolidates and extends the approved Dependency-Aware Testing and Incremental Test Execution proposals into an implementable specification.

The formal language specification (Testing) defines the semantics of test execution. This proposal provides the detailed implementation model—data structures, algorithms, and cache formats—that the compiler must implement to satisfy that specification.

Motivation

Ori’s core promise is code that proves itself: every function has tests, every change is verified, every effect is explicit. The testing system is not an afterthought—it’s integral to compilation.

Current state:

Test syntax is implemented (@test tests @target)
Basic test runner exists
Coverage enforcement exists
Missing: Tests don’t run automatically during compilation
Missing: No dependency-aware test selection
Missing: No incremental caching

This proposal specifies the complete execution model so that:

ori check compiles code AND runs affected tests
Changes to @foo automatically trigger tests for @foo and its callers
Unchanged code uses cached test results
Developers get immediate feedback without manual test invocation

Design

Core Principle: Tests Are Part of Compilation

In traditional languages, testing is separate from building:

build → (separate) → test

In Ori, testing is integrated into compilation:

parse → type check → test execution → codegen

A successful ori check means:

Code compiles
Affected tests pass (or are reported, in non-strict mode)

Compilation Pipeline

Source Files
    │
    ▼
┌─────────────────┐
│     Parse       │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Type Check    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Test Discovery │  ← Build test registry, compute affected set
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ Test Execution  │  ← Run affected attached tests
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│    Code Gen     │  (if requested)
└─────────────────┘

Tests run after type checking:

A test’s target must type-check before the test can execute
Test failures are reported but don’t block codegen (unless --strict)

Test Registry

The compiler builds a test registry during compilation:

struct TestRegistry {
    /// Map from function → tests that target it
    tests_for: HashMap<FunctionId, Vec<TestId>>,

    /// Map from function → functions that call it (reverse deps)
    callers: HashMap<FunctionId, HashSet<FunctionId>>,

    /// Set of floating tests (target = _)
    floating: HashSet<TestId>,
}

Built during type checking:

For each @test tests @target: add to tests_for[target]
For each function call f() inside function g: add g to callers[f]
For each @test tests _: add to floating

Change Detection

A function is changed if its content hash differs from the cached hash.

The content hash captures the function’s definition—its body, signature, and constraints. When a function’s content changes, the reverse closure algorithm (see below) propagates this to find all tests that may be affected.

Note: A function’s hash does not change when its dependencies change. The test selection algorithm handles dependency propagation separately: if @bar changes and @foo calls @bar, then @foo is in @bar’s reverse closure, and tests for @foo will run—even though @foo’s hash is unchanged.

struct ChangeDetector {
    /// Previous compilation's hashes
    cached_hashes: HashMap<FunctionId, u64>,

    /// Current compilation's hashes
    current_hashes: HashMap<FunctionId, u64>,
}

impl ChangeDetector {
    fn is_changed(&self, func: FunctionId) -> bool {
        match (self.cached_hashes.get(&func), self.current_hashes.get(&func)) {
            (Some(old), Some(new)) => old != new,
            (None, Some(_)) => true,  // New function
            (Some(_), None) => true,  // Deleted function
            (None, None) => false,
        }
    }

    fn changed_functions(&self) -> HashSet<FunctionId> {
        self.current_hashes.keys()
            .filter(|f| self.is_changed(**f))
            .copied()
            .collect()
    }
}

Content hash includes:

Function body AST (normalized: whitespace and comments stripped, source structure preserved)
Parameter types and names
Return type
Capability requirements
Generic constraints

Normalization ensures that formatting changes (whitespace, comments) do not invalidate the cache, while meaningful changes to code structure do.

Reverse Transitive Closure

When a function changes, we need to find all functions that depend on it:

impl TestRegistry {
    /// Compute all functions affected by changes to `roots`
    fn reverse_closure(&self, roots: &HashSet<FunctionId>) -> HashSet<FunctionId> {
        let mut affected = roots.clone();
        let mut queue: VecDeque<_> = roots.iter().copied().collect();

        while let Some(func) = queue.pop_front() {
            if let Some(callers) = self.callers.get(&func) {
                for caller in callers {
                    if affected.insert(*caller) {
                        queue.push_back(*caller);
                    }
                }
            }
        }

        affected
    }

    /// Find tests to run for the given changed functions
    fn affected_tests(&self, changed: &HashSet<FunctionId>) -> Vec<TestId> {
        let affected = self.reverse_closure(changed);

        affected.iter()
            .filter_map(|f| self.tests_for.get(f))
            .flatten()
            .copied()
            .collect()
    }
}

Example: Dependency Propagation

@helper (x: int) -> int = x * 2

@process (x: int) -> int = helper(x: x) + 1

@handle (x: int) -> int = process(x: x) + 10

@test_helper tests @helper () -> void = ...
@test_process tests @process () -> void = ...
@test_handle tests @handle () -> void = ...

Dependency graph:

@helper ← @process ← @handle

If @helper changes:

Changed set: {@helper}
Reverse closure: {@helper, @process, @handle}
Affected tests: {@test_helper, @test_process, @test_handle}

If @handle changes:

Changed set: {@handle}
Reverse closure: {@handle} (no callers)
Affected tests: {@test_handle}

Test Result Caching

Test results are cached keyed by the hash of all inputs:

struct TestCache {
    /// Map from (test_id, inputs_hash) → result
    results: HashMap<(TestId, u64), TestResult>,
}

impl TestCache {
    fn inputs_hash(&self, test: TestId, registry: &TestRegistry) -> u64 {
        // Hash of all target functions' content hashes
        let mut hasher = DefaultHasher::new();
        for target in &test.targets {
            if let Some(hash) = registry.function_hash(target) {
                hash.hash(&mut hasher);
            }
        }
        hasher.finish()
    }

    fn get_cached(&self, test: TestId, inputs_hash: u64) -> Option<&TestResult> {
        self.results.get(&(test, inputs_hash))
    }
}

Cache invalidation is automatic:

If any target’s hash changes, the inputs_hash changes
Old cache entries become unreachable (can be pruned)

Incremental Execution Flow

1. Load cache from .ori/cache/
2. Parse and type-check all files
3. Compute current function hashes
4. Detect changed functions (hash mismatch)
5. Compute reverse closure (affected set)
6. Find attached tests for affected set
7. For each test:
   a. Compute inputs_hash
   b. If cached result exists with same inputs_hash → skip
   c. Otherwise → execute test, cache result
8. Report results
9. Save cache to .ori/cache/

Full Compilation

On full compilation (no cache, or --clean):

All attached tests execute
Results are cached
Floating tests do NOT execute (require ori test)

Cache Storage

.ori/
├── cache/
│   ├── hashes.bin       # FunctionId → content hash
│   ├── deps.bin         # Dependency graph (callers map)
│   └── test-results/    # TestId → TestResult
└── ...

Format: Binary serialization (bincode or similar) for performance.

The .ori/ directory should be in .gitignore.

Cache Maintenance

Pruning: On successful build completion, the compiler removes cache entries for functions that no longer exist in the codebase. This prevents unbounded cache growth as code evolves.

Invalidation: Cache entries are never explicitly invalidated. Instead, the inputs_hash mechanism ensures stale entries are simply not matched—they become unreachable and are pruned on the next successful build.

Test Result States

enum TestResult {
    Pass { duration: Duration },
    Fail { message: String, location: SourceLoc },
    Skip { reason: String },
    Error { message: String },  // Could not execute
}

Non-Blocking vs Strict Mode

Non-blocking (default):

$ ori check src/

Compiling...
Running 3 affected tests...
  ✓ @test_helper (2ms)
  ✗ @test_process
    assertion failed: expected 5, got 6
    at src/lib.ori:25:5
  ✓ @test_handle (1ms)

Build succeeded with 1 test failure.

Compilation completes. Exit code 0. Developer can iterate.

Strict mode (--strict):

$ ori check --strict src/

Compiling...
Running 3 affected tests...
  ✓ @test_helper (2ms)
  ✗ @test_process
    assertion failed: expected 5, got 6

Build FAILED: 1 test failure.

Compilation fails. Exit code 1. For CI and pre-commit hooks.

Performance Warning

Targeted tests run during compilation. Slow tests degrade the development experience.

warning: attached test @test_large_parse took 350ms
  --> src/parser.ori:100:1
   |
100| @test_large_parse tests @parse () -> void = ...
   | ^^^^^^^^^^^^^^^^^ slow attached test
   |
   = note: attached tests run during compilation
   = help: consider making this a floating test: `tests _`
   = note: threshold is 100ms (configurable in ori.toml)

Configuration:

# ori.toml
[testing]
slow_test_threshold = "100ms"  # default

Floating Tests

Tests with tests _ are explicitly excluded from compilation:

@test_integration tests _ () -> void = {
    // Slow integration test with real I/O
    let result = full_pipeline(input: large_input)
    assert_ok(result: result)
}

Floating tests:

Do NOT run during ori check
Do NOT satisfy coverage requirements
Only run via explicit ori test

Use cases:

Integration tests
Performance benchmarks
Tests requiring external services
Tests with large datasets

CLI Interface

ori check

ori check [OPTIONS] <PATH>

Compile and run affected attached tests.

Options:
    --no-test     Skip test execution (compile only)
    --strict      Fail build on any test failure
    --verbose     Show all test results, not just failures
    --clean       Ignore cache, run all attached tests

ori test

ori test [OPTIONS] [PATH]

Run tests explicitly.

Options:
    --only-attached    Skip floating tests
    --filter <PATTERN> Run only tests matching pattern
    --verbose          Show all test results

Execution Matrix

Command	Attached (affected)	Attached (unaffected)	Floating
`ori check`	Run	Skip (cached)	Never
`ori check --no-test`	Never	Never	Never
`ori check --clean`	Run	Run	Never
`ori test`	Run	Run	Run
`ori test --only-attached`	Run	Run	Never

Note: --clean forces re-execution of all attached tests (ignoring cache), but does not run floating tests. Floating tests always require explicit ori test, regardless of other flags.

Implementation Plan

Phase 1: Test Registry

Add TestRegistry struct to compiler
Build tests_for map during type checking
Build callers map from call graph analysis
Identify floating tests

Phase 2: Change Detection

Implement content hashing for functions
Add ChangeDetector struct
Integrate with existing incremental compilation (if any)

Phase 3: Reverse Closure

Implement reverse_closure() algorithm
Implement affected_tests() lookup
Add tests for closure computation

Phase 4: Test Caching

Define cache file format
Implement cache loading/saving
Implement inputs_hash computation
Implement cache lookup during test execution

Phase 5: CLI Integration

Modify ori check to run tests after type checking
Add --no-test, --strict, --clean flags
Implement non-blocking result reporting
Implement strict mode failure

Phase 6: Performance Warnings

Track test execution duration
Emit warning for slow attached tests
Read threshold from ori.toml

Phase 7: Polish

Progress reporting during test execution
Parallel test execution
Cache pruning (remove stale entries)
Documentation

Testing the Implementation

The implementation should be verified with:

Unit tests for registry, closure, and cache logic
Integration tests for CLI behavior
Spec tests in tests/spec/testing/:
- incremental.ori — verify only affected tests run
- closure.ori — verify reverse closure is computed correctly
- caching.ori — verify cache hits skip execution
- strict.ori — verify --strict fails on test failure
- floating.ori — verify tests _ excluded from check

Alternatives Considered

1. Tests Run Before Type Checking

Rejected: A test’s target must type-check first. Running tests on invalid code is meaningless.

2. Tests Block Compilation by Default

Rejected: Would slow iteration. Developers want to see both compilation errors AND test failures together, then fix iteratively.

3. No Caching, Always Run All Tests

Rejected: Defeats the purpose of incremental compilation. Would be too slow for large codebases.

4. Forward Closure Instead of Reverse Closure

Rejected: Forward closure (dependencies of changed function) misses the critical case: a change to @helper should run tests for @caller because @caller’s behavior depends on @helper.

Summary

This proposal defines Ori’s test execution model:

Tests run during compilation — after type checking, before codegen
Dependency-aware — changes to @foo trigger tests for @foo and all callers
Incremental — unchanged functions use cached test results
Non-blocking by default — failures reported but don’t block compilation
Strict mode for CI — --strict fails on any test failure
Performance-conscious — warnings for slow attached tests

Combined with mandatory test coverage and capability-based mocking, this creates a system where code integrity is enforced automatically as a natural part of the development workflow.