Proposal: Test Execution Model
Status: Approved
Author: Eric (with Claude)
Created: 2026-01-29
Approved: 2026-01-29
Affects: compiler/oric/, compiler/ori_ir/, CLI interface, .ori/cache/
Summary
Define the complete test execution model for Ori: when tests run, which tests run, and how the compiler integrates test execution into the build process. This proposal consolidates and extends the approved Dependency-Aware Testing and Incremental Test Execution proposals into an implementable specification.
The formal language specification (Testing) defines the semantics of test execution. This proposal provides the detailed implementation model—data structures, algorithms, and cache formats—that the compiler must implement to satisfy that specification.
Motivation
Ori’s core promise is code that proves itself: every function has tests, every change is verified, every effect is explicit. The testing system is not an afterthought—it’s integral to compilation.
Current state:
- Test syntax is implemented (
@test tests @target) - Basic test runner exists
- Coverage enforcement exists
- Missing: Tests don’t run automatically during compilation
- Missing: No dependency-aware test selection
- Missing: No incremental caching
This proposal specifies the complete execution model so that:
ori checkcompiles code AND runs affected tests- Changes to
@fooautomatically trigger tests for@fooand its callers - Unchanged code uses cached test results
- Developers get immediate feedback without manual test invocation
Design
Core Principle: Tests Are Part of Compilation
In traditional languages, testing is separate from building:
build → (separate) → test
In Ori, testing is integrated into compilation:
parse → type check → test execution → codegen
A successful ori check means:
- Code compiles
- Affected tests pass (or are reported, in non-strict mode)
Compilation Pipeline
Source Files
│
▼
┌─────────────────┐
│ Parse │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Type Check │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Test Discovery │ ← Build test registry, compute affected set
└────────┬────────┘
│
▼
┌─────────────────┐
│ Test Execution │ ← Run affected attached tests
└────────┬────────┘
│
▼
┌─────────────────┐
│ Code Gen │ (if requested)
└─────────────────┘
Tests run after type checking:
- A test’s target must type-check before the test can execute
- Test failures are reported but don’t block codegen (unless
--strict)
Test Registry
The compiler builds a test registry during compilation:
struct TestRegistry {
/// Map from function → tests that target it
tests_for: HashMap<FunctionId, Vec<TestId>>,
/// Map from function → functions that call it (reverse deps)
callers: HashMap<FunctionId, HashSet<FunctionId>>,
/// Set of floating tests (target = _)
floating: HashSet<TestId>,
}
Built during type checking:
- For each
@test tests @target: add totests_for[target] - For each function call
f()inside functiong: addgtocallers[f] - For each
@test tests _: add tofloating
Change Detection
A function is changed if its content hash differs from the cached hash.
The content hash captures the function’s definition—its body, signature, and constraints. When a function’s content changes, the reverse closure algorithm (see below) propagates this to find all tests that may be affected.
Note: A function’s hash does not change when its dependencies change. The test selection algorithm handles dependency propagation separately: if @bar changes and @foo calls @bar, then @foo is in @bar’s reverse closure, and tests for @foo will run—even though @foo’s hash is unchanged.
struct ChangeDetector {
/// Previous compilation's hashes
cached_hashes: HashMap<FunctionId, u64>,
/// Current compilation's hashes
current_hashes: HashMap<FunctionId, u64>,
}
impl ChangeDetector {
fn is_changed(&self, func: FunctionId) -> bool {
match (self.cached_hashes.get(&func), self.current_hashes.get(&func)) {
(Some(old), Some(new)) => old != new,
(None, Some(_)) => true, // New function
(Some(_), None) => true, // Deleted function
(None, None) => false,
}
}
fn changed_functions(&self) -> HashSet<FunctionId> {
self.current_hashes.keys()
.filter(|f| self.is_changed(**f))
.copied()
.collect()
}
}
Content hash includes:
- Function body AST (normalized: whitespace and comments stripped, source structure preserved)
- Parameter types and names
- Return type
- Capability requirements
- Generic constraints
Normalization ensures that formatting changes (whitespace, comments) do not invalidate the cache, while meaningful changes to code structure do.
Reverse Transitive Closure
When a function changes, we need to find all functions that depend on it:
impl TestRegistry {
/// Compute all functions affected by changes to `roots`
fn reverse_closure(&self, roots: &HashSet<FunctionId>) -> HashSet<FunctionId> {
let mut affected = roots.clone();
let mut queue: VecDeque<_> = roots.iter().copied().collect();
while let Some(func) = queue.pop_front() {
if let Some(callers) = self.callers.get(&func) {
for caller in callers {
if affected.insert(*caller) {
queue.push_back(*caller);
}
}
}
}
affected
}
/// Find tests to run for the given changed functions
fn affected_tests(&self, changed: &HashSet<FunctionId>) -> Vec<TestId> {
let affected = self.reverse_closure(changed);
affected.iter()
.filter_map(|f| self.tests_for.get(f))
.flatten()
.copied()
.collect()
}
}
Example: Dependency Propagation
@helper (x: int) -> int = x * 2
@process (x: int) -> int = helper(x: x) + 1
@handle (x: int) -> int = process(x: x) + 10
@test_helper tests @helper () -> void = ...
@test_process tests @process () -> void = ...
@test_handle tests @handle () -> void = ...
Dependency graph:
@helper ← @process ← @handle
If @helper changes:
- Changed set:
{@helper} - Reverse closure:
{@helper, @process, @handle} - Affected tests:
{@test_helper, @test_process, @test_handle}
If @handle changes:
- Changed set:
{@handle} - Reverse closure:
{@handle}(no callers) - Affected tests:
{@test_handle}
Test Result Caching
Test results are cached keyed by the hash of all inputs:
struct TestCache {
/// Map from (test_id, inputs_hash) → result
results: HashMap<(TestId, u64), TestResult>,
}
impl TestCache {
fn inputs_hash(&self, test: TestId, registry: &TestRegistry) -> u64 {
// Hash of all target functions' content hashes
let mut hasher = DefaultHasher::new();
for target in &test.targets {
if let Some(hash) = registry.function_hash(target) {
hash.hash(&mut hasher);
}
}
hasher.finish()
}
fn get_cached(&self, test: TestId, inputs_hash: u64) -> Option<&TestResult> {
self.results.get(&(test, inputs_hash))
}
}
Cache invalidation is automatic:
- If any target’s hash changes, the inputs_hash changes
- Old cache entries become unreachable (can be pruned)
Incremental Execution Flow
1. Load cache from .ori/cache/
2. Parse and type-check all files
3. Compute current function hashes
4. Detect changed functions (hash mismatch)
5. Compute reverse closure (affected set)
6. Find attached tests for affected set
7. For each test:
a. Compute inputs_hash
b. If cached result exists with same inputs_hash → skip
c. Otherwise → execute test, cache result
8. Report results
9. Save cache to .ori/cache/
Full Compilation
On full compilation (no cache, or --clean):
- All attached tests execute
- Results are cached
- Floating tests do NOT execute (require
ori test)
Cache Storage
.ori/
├── cache/
│ ├── hashes.bin # FunctionId → content hash
│ ├── deps.bin # Dependency graph (callers map)
│ └── test-results/ # TestId → TestResult
└── ...
Format: Binary serialization (bincode or similar) for performance.
The .ori/ directory should be in .gitignore.
Cache Maintenance
Pruning: On successful build completion, the compiler removes cache entries for functions that no longer exist in the codebase. This prevents unbounded cache growth as code evolves.
Invalidation: Cache entries are never explicitly invalidated. Instead, the inputs_hash mechanism ensures stale entries are simply not matched—they become unreachable and are pruned on the next successful build.
Test Result States
enum TestResult {
Pass { duration: Duration },
Fail { message: String, location: SourceLoc },
Skip { reason: String },
Error { message: String }, // Could not execute
}
Non-Blocking vs Strict Mode
Non-blocking (default):
$ ori check src/
Compiling...
Running 3 affected tests...
✓ @test_helper (2ms)
✗ @test_process
assertion failed: expected 5, got 6
at src/lib.ori:25:5
✓ @test_handle (1ms)
Build succeeded with 1 test failure.
Compilation completes. Exit code 0. Developer can iterate.
Strict mode (--strict):
$ ori check --strict src/
Compiling...
Running 3 affected tests...
✓ @test_helper (2ms)
✗ @test_process
assertion failed: expected 5, got 6
Build FAILED: 1 test failure.
Compilation fails. Exit code 1. For CI and pre-commit hooks.
Performance Warning
Targeted tests run during compilation. Slow tests degrade the development experience.
warning: attached test @test_large_parse took 350ms
--> src/parser.ori:100:1
|
100| @test_large_parse tests @parse () -> void = ...
| ^^^^^^^^^^^^^^^^^ slow attached test
|
= note: attached tests run during compilation
= help: consider making this a floating test: `tests _`
= note: threshold is 100ms (configurable in ori.toml)
Configuration:
# ori.toml
[testing]
slow_test_threshold = "100ms" # default
Floating Tests
Tests with tests _ are explicitly excluded from compilation:
@test_integration tests _ () -> void = {
// Slow integration test with real I/O
let result = full_pipeline(input: large_input)
assert_ok(result: result)
}
Floating tests:
- Do NOT run during
ori check - Do NOT satisfy coverage requirements
- Only run via explicit
ori test
Use cases:
- Integration tests
- Performance benchmarks
- Tests requiring external services
- Tests with large datasets
CLI Interface
ori check
ori check [OPTIONS] <PATH>
Compile and run affected attached tests.
Options:
--no-test Skip test execution (compile only)
--strict Fail build on any test failure
--verbose Show all test results, not just failures
--clean Ignore cache, run all attached tests
ori test
ori test [OPTIONS] [PATH]
Run tests explicitly.
Options:
--only-attached Skip floating tests
--filter <PATTERN> Run only tests matching pattern
--verbose Show all test results
Execution Matrix
| Command | Attached (affected) | Attached (unaffected) | Floating |
|---|---|---|---|
ori check | Run | Skip (cached) | Never |
ori check --no-test | Never | Never | Never |
ori check --clean | Run | Run | Never |
ori test | Run | Run | Run |
ori test --only-attached | Run | Run | Never |
Note: --clean forces re-execution of all attached tests (ignoring cache), but does not run floating tests. Floating tests always require explicit ori test, regardless of other flags.
Implementation Plan
Phase 1: Test Registry
- Add
TestRegistrystruct to compiler - Build
tests_formap during type checking - Build
callersmap from call graph analysis - Identify floating tests
Phase 2: Change Detection
- Implement content hashing for functions
- Add
ChangeDetectorstruct - Integrate with existing incremental compilation (if any)
Phase 3: Reverse Closure
- Implement
reverse_closure()algorithm - Implement
affected_tests()lookup - Add tests for closure computation
Phase 4: Test Caching
- Define cache file format
- Implement cache loading/saving
- Implement inputs_hash computation
- Implement cache lookup during test execution
Phase 5: CLI Integration
- Modify
ori checkto run tests after type checking - Add
--no-test,--strict,--cleanflags - Implement non-blocking result reporting
- Implement strict mode failure
Phase 6: Performance Warnings
- Track test execution duration
- Emit warning for slow attached tests
- Read threshold from
ori.toml
Phase 7: Polish
- Progress reporting during test execution
- Parallel test execution
- Cache pruning (remove stale entries)
- Documentation
Testing the Implementation
The implementation should be verified with:
- Unit tests for registry, closure, and cache logic
- Integration tests for CLI behavior
- Spec tests in
tests/spec/testing/:incremental.ori— verify only affected tests runclosure.ori— verify reverse closure is computed correctlycaching.ori— verify cache hits skip executionstrict.ori— verify--strictfails on test failurefloating.ori— verifytests _excluded from check
Alternatives Considered
1. Tests Run Before Type Checking
Rejected: A test’s target must type-check first. Running tests on invalid code is meaningless.
2. Tests Block Compilation by Default
Rejected: Would slow iteration. Developers want to see both compilation errors AND test failures together, then fix iteratively.
3. No Caching, Always Run All Tests
Rejected: Defeats the purpose of incremental compilation. Would be too slow for large codebases.
4. Forward Closure Instead of Reverse Closure
Rejected: Forward closure (dependencies of changed function) misses the critical case: a change to @helper should run tests for @caller because @caller’s behavior depends on @helper.
Summary
This proposal defines Ori’s test execution model:
- Tests run during compilation — after type checking, before codegen
- Dependency-aware — changes to
@footrigger tests for@fooand all callers - Incremental — unchanged functions use cached test results
- Non-blocking by default — failures reported but don’t block compilation
- Strict mode for CI —
--strictfails on any test failure - Performance-conscious — warnings for slow attached tests
Combined with mandatory test coverage and capability-based mocking, this creates a system where code integrity is enforced automatically as a natural part of the development workflow.