Section 02: Subprocess Orchestrator
Status: Not Started
Goal: Replace the in-process run_file_llvm() call with subprocess spawning. Each spec test file is compiled and executed in a separate ori test --backend=llvm --json <file> process. The orchestrator collects results via JSON stdout, detects crashes via exit codes, and aggregates into the existing TestSummary.
Success Criteria:
- LLVM backend spec tests complete without crashing the parent process
- Worker crashes produce
BackendCrashoutcomes that block the test gate - Worker timeouts are detected and reported
- Pass/fail/skip/lcfail counts match in-process execution for non-crashing files
- Satisfies mission criteria:
./test-all.shpasses, crashes are real failures
Context: The current architecture calls run_file_llvm() in-process for each spec test file. When LLVM C++ encounters malformed IR (from unresolved type variables, missing monomorphization, etc.), it crashes with SIGSEGV. Rust’s catch_unwind wraps the compilation but cannot catch C++ signals. The crash kills the entire test runner process, failing ./test-all.sh and blocking the pre-commit hook. Moving to subprocess-per-file provides OS-level fault containment.
Reference implementations:
- Zig
src/Compilation.zig:6304-6334: spawns clang as subprocess per C object file, captures exit code + stderr, handles crash via exit code - Rust
src/tools/compiletest/src/executor.rs:66-88: per-test deadline tracking withtry_wait()polling
Depends on: Section 01 (JSON output protocol — the --json flag and JsonFileSummary types).
02.1 Worker Spawning and Result Collection
File(s): new compiler/oric/src/test/runner/llvm_worker.rs (orchestrator module)
Extract the subprocess orchestration logic into a new llvm_worker.rs module. The existing llvm_backend.rs (545 lines) remains as the worker’s in-process execution path (used when ori test --backend=llvm --json is invoked on a single file). The new module handles spawning and result collection.
File size constraint: llvm_worker.rs must stay under 500 lines. Target: ~200-300 lines for spawn + collect + extract + per-file orchestrator. Pool logic is in 02.3 and may need its own submodule if combined total exceeds 500.
TDD ordering: write tests FIRST, verify they fail, then implement.
-
Create
compiler/oric/src/test/runner/llvm_worker.rs— the orchestrator module -
Declare
#[cfg(feature = "llvm")] mod llvm_worker;inrunner/mod.rs(alongside existing#[cfg(feature = "llvm")] mod llvm_backend;) -
Binary path resolution:
current_exe()is resolved once inrun_llvm_tests_isolated()(see 02.3) and passed to all worker spawn calls. No per-file resolution.current_exe()returns the actual binary path (e.g.,target/release/orifrom test-all.sh, ortarget/debug/oriin dev). -
02.1.T — Tests first (in
compiler/oric/src/test/runner/llvm_worker/tests.rs— siblingtests.rspattern):-
test_extract_framed_json_success— stdout with sentinels and JSON between them returnsSome(json_content) -
test_extract_framed_json_with_print_pollution— stdout has Oriprint()output before/after sentinels, JSON still extracted correctly -
test_extract_framed_json_missing_begin— no begin sentinel returnsNone -
test_extract_framed_json_missing_end— begin but no end sentinel returnsNone -
test_extract_framed_json_empty_content— sentinels with nothing between them returnsSome("") -
test_spawn_worker_good_file— spawncurrent_exe() test --backend=llvm --json tests/spec/types/primitives.ori, verify exit 0 and sentinel-framed JSON in stdout -
test_spawn_worker_nonexistent_file— spawn with nonexistent file, verify non-zero exit (1 or 2) and either no sentinel frame or valid JSON with error - Verify tests fail before implementing
-
-
Implement
extract_framed_json(stdout: &str) -> Option<&str>:- Scan for
JSON_BEGIN_SENTINELandJSON_END_SENTINEL(imported fromjson_protocol) - Return content between sentinels (trimmed)
- Return
Noneif either sentinel missing
- Scan for
-
Implement
spawn_llvm_worker(binary: &Path, file: &Path, config: &TestRunnerConfig) -> std::io::Result<Child>:fn spawn_llvm_worker( binary: &Path, file: &Path, config: &TestRunnerConfig, ) -> std::io::Result<Child> { let mut cmd = Command::new(binary); cmd.arg("test") .arg("--backend=llvm") .arg("--json") .arg(file) .stdout(Stdio::piped()) .stderr(Stdio::piped()); // Forward filter if present if let Some(ref filter) = config.filter { cmd.arg(format!("--filter={filter}")); } cmd.spawn() } -
Implement
collect_worker_result(child: Child, file: &Path, timeout: Duration, interner: &StringInterner) -> FileSummary:- Wait for child to exit (with timeout — see 02.2 for
wait_with_timeout) - Signal death (Unix:
status.signal().is_some()): worker crashed ->crash_summary()(see 02.2) - Exit code 0 or 1: read stdout, call
extract_framed_json(), parse asVec<JsonFileSummary>, convert first element viainto_file_summary(). Anything outside sentinel frame is discarded. - Exit code 2: no tests found -> empty
FileSummarywithresults: [] - JSON parse failure (no sentinel frame, or malformed content): fall back to
crash_summary()with message “worker exited {code} with no JSON output” and include last 5 lines of stderr
- Wait for child to exit (with timeout — see 02.2 for
-
Implement
run_file_llvm_isolated(file: &Path, binary: &Path, config: &TestRunnerConfig, interner: &StringInterner) -> FileSummary— the top-level per-file orchestrator function. Theinternerparam is used forcrash_summary()andinto_file_summary()re-interning only. -
Verify all tests from 02.1.T now PASS
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (02.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.2 Crash and Timeout Detection
File(s): compiler/oric/src/test/runner/llvm_worker.rs
Handle the two failure modes that in-process execution can’t survive: worker signal death and worker hangs.
TDD ordering: write tests FIRST for detect_crash, crash_summary, and wait_with_timeout.
-
02.2.T — Tests first (in
compiler/oric/src/test/runner/llvm_worker/tests.rs):-
test_detect_crash_sigsegv— spawnsh -c "kill -11 $$", wait, pass exit status todetect_crash, verify returnsSome("worker killed by SIGSEGV (signal 11)")(Unix-only,#[cfg(unix)]) -
test_detect_crash_sigabrt— spawnsh -c "kill -6 $$", verifySome("worker killed by SIGABRT (signal 6)") -
test_detect_crash_normal_exit— spawn a process that exits 0, verifydetect_crashreturnsNone -
test_detect_crash_error_exit— spawn a process that exits 1, verifydetect_crashreturnsNone -
test_wait_with_timeout_completes— spawntrue(exits immediately), verifywait_with_timeout(1s)returnsOk(status) -
test_wait_with_timeout_kills_slow_process— spawnsleep 999, verifywait_with_timeout(100ms)returnsErr(WaitError::Timeout { .. })promptly -
test_crash_summary_has_backend_crash— callcrash_summary, verify result hasbackend_crash == 1,has_failures() == true, and singleBackendCrashtest result - Verify tests fail before implementing
-
-
Crash detection (
detect_crash): Incollect_worker_result, check for signal death:- On Unix: use
status.signal()fromstd::os::unix::process::ExitStatusExt.status.code()returnsNonefor signal-killed processes (the 128+N convention is shell-only, not Rust). - On non-Unix: use
status.code()and treat unexpected non-zero codes as potential crashes.
#[cfg(unix)] fn detect_crash(status: ExitStatus) -> Option<String> { use std::os::unix::process::ExitStatusExt; if let Some(signal) = status.signal() { let sig_name = match signal { 11 => "SIGSEGV", 6 => "SIGABRT", _ => "unknown signal", }; Some(format!("worker killed by {sig_name} (signal {signal})")) } else { None } } #[cfg(not(unix))] fn detect_crash(_status: ExitStatus) -> Option<String> { // Signal detection not available on non-Unix. // Crashes in the subprocess won't be distinguished from normal failures. None } - On Unix: use
-
Crash result construction (
crash_summary): Single synthetic test result (orchestrator doesn’t know which tests were in the file):fn crash_summary(file: &Path, message: String, interner: &StringInterner) -> FileSummary { let mut summary = FileSummary::new(file.to_path_buf()); summary.add_result(TestResult { name: interner.intern("llvm_backend_crash"), targets: vec![], outcome: TestOutcome::BackendCrash(message), duration: Duration::ZERO, }); summary }Note: using
add_result()instead of manual field construction ensures counter bookkeeping is always consistent (single source of truth for thematchinadd_result). -
Timeout detection (
wait_with_timeout):try_wait()polling with configurable timeout (default 60s):enum WaitError { Timeout { elapsed: Duration }, Io(std::io::Error), } fn wait_with_timeout( child: &mut Child, timeout: Duration, ) -> Result<ExitStatus, WaitError> { let start = Instant::now(); loop { match child.try_wait() { Ok(Some(status)) => return Ok(status), Ok(None) if start.elapsed() > timeout => { let _ = child.kill(); let _ = child.wait(); // reap zombie return Err(WaitError::Timeout { elapsed: start.elapsed() }); } Ok(None) => std::thread::sleep(Duration::from_millis(50)), Err(e) => return Err(WaitError::Io(e)), } } }Note:
child.kill()andchild.wait()uselet _ =to avoid propagating IO errors from already-dead processes.WaitErroris an enum covering both IO errors and timeout, ensuring the?-free match ontry_wait()handles all cases explicitly. -
Timeout configuration: Add
worker_timeout: DurationtoTestRunnerConfigwith default 60 seconds. Parse--worker-timeout=NCLI flag inmain.rs(inside “test” match arm):} else if let Some(secs) = arg.strip_prefix("--worker-timeout=") { if let Ok(n) = secs.parse::<u64>() { config.worker_timeout = Duration::from_secs(n); } } -
Capture stderr on crash: Include last 5 lines of stderr in
BackendCrashmessage for diagnostic context. UseString::from_utf8_lossysince stderr may contain non-UTF8 from LLVM C++. -
Verify all tests from 02.2.T now PASS
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (02.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.3 Bounded Worker Pool
File(s): compiler/oric/src/test/runner/llvm_worker.rs
The current LLVM backend runs files sequentially because LLVM context creation contends on global state within a single process. With subprocess isolation, each worker has its own address space — parallelism is safe. Use a bounded pool to limit concurrency to ~CPU count.
-
Implement a simple bounded worker pool:
struct WorkerPool { max_workers: usize, active: Vec<(PathBuf, Child)>, timeout: Duration, } impl WorkerPool { fn new(max_workers: usize, timeout: Duration) -> Self { ... } /// Submit a file. If pool is full, wait for one to finish first. /// Returns the completed worker's result if pool was full. fn submit(&mut self, file: PathBuf, child: Child) -> Option<(PathBuf, ExitStatus, Vec<u8>, Vec<u8>)> { let result = if self.active.len() >= self.max_workers { // Poll all active children with try_wait() to find one that's done. // If none are done, sleep briefly and retry (bounded by timeout). Some(self.wait_any()) } else { None }; self.active.push((file, child)); result } /// Poll active children until one finishes. Collect its stdout/stderr /// via take_stdout()/take_stderr() before wait(). fn wait_any(&mut self) -> (PathBuf, ExitStatus, Vec<u8>, Vec<u8>) { // Note: must call child.stdout.take() and child.stderr.take() // BEFORE child.wait(), then read the taken handles to completion. // child.wait() closes stdin but stdout/stderr handles are owned // by the Stdio::piped() setup — taking them transfers ownership. // Alternative: use child.wait_with_output() which handles this // automatically, but it consumes the Child. loop { for i in 0..self.active.len() { if let Ok(Some(status)) = self.active[i].1.try_wait() { let (path, mut child) = self.active.swap_remove(i); let stdout = read_child_pipe(child.stdout.take()); let stderr = read_child_pipe(child.stderr.take()); return (path, status, stdout, stderr); } } std::thread::sleep(Duration::from_millis(10)); } } /// Wait for all remaining workers to finish. fn drain(&mut self) -> Vec<(PathBuf, ExitStatus, Vec<u8>, Vec<u8>)> { ... } /// Kill workers that have exceeded the timeout. Called periodically /// from wait_any() and drain(). Uses Instant tracking per-worker /// (store spawn time alongside child in `active` vec). fn kill_timed_out(&mut self) -> Vec<(PathBuf, Vec<u8>, Vec<u8>)> { ... } }Timeout integration: The pool must track each child’s spawn time (add
Instantto theactivetuple:Vec<(PathBuf, Child, Instant)>). Bothwait_any()anddrain()must callkill_timed_out()on each polling iteration to kill children that have exceededself.timeout. A killed child returnsBackendCrashwith a timeout message. Without this, a hung worker would blockwait_any()indefinitely. -
Default pool size:
std::thread::available_parallelism().map(|n| n.get()).unwrap_or(4)— matches CPU count, falls back to 4 -
Add
parallel_workers: Option<usize>andworker_timeout: Durationfields toTestRunnerConfigstruct (runner/mod.rs line 43-56). Also updateDefaultimpl (line 58-68) withparallel_workers: Noneandworker_timeout: Duration::from_secs(60). Note:worker_timeoutwas also mentioned in 02.2 for CLI parsing — both the struct field and CLI parsing are needed. The#[expect(clippy::struct_excessive_bools)]is unaffected (new fields areOption<usize>andDuration, notbool). -
Parse
--parallel-workers=NCLI flag inmain.rs(inside “test” match arm, after other flags).--no-parallel(already parsed at line 128) setsconfig.parallel = false, whichrun_llvm_tests_isolateduses to set pool size to 1. -
When
--no-parallelis specified, run workers sequentially (spawn, wait, parse, next file) — this is the simplest mode and useful for debugging. -
Top-level orchestrator function:
pub fn run_llvm_tests_isolated( files: &[TestFile], // TestFile { path: PathBuf } from discovery config: &TestRunnerConfig, interner: &StringInterner, // orchestrator's interner (for crash_summary) ) -> Vec<FileSummary> { let binary = std::env::current_exe().expect("current_exe"); let pool_size = if config.parallel { config.parallel_workers.unwrap_or_else(|| { std::thread::available_parallelism() .map(|n| n.get()) .unwrap_or(4) }) } else { 1 }; // ... spawn workers through pool, collect results } -
File size enforcement: After implementing 02.1 + 02.2 + 02.3, check line count:
llvm_worker.rsmust stay under 500 lines- If pool logic pushes it over, extract
WorkerPooltocompiler/oric/src/test/runner/worker_pool.rsand declaremod worker_pool;inllvm_worker.rs
-
02.3.T — Tests (in
llvm_worker/tests.rsorworker_pool/tests.rs):-
test_pool_bounds_concurrency— pool withmax_workers=2, submit 5sleep 1children, verifyactive.len() <= 2at all times -
test_pool_sequential_mode— pool withmax_workers=1, verify children run one at a time -
test_pool_drain_collects_all— submit 3 children, calldrain(), verify 3 results returned -
test_pool_kills_timed_out_worker— pool withtimeout=200ms, submitsleep 999child, verifywait_any()ordrain()kills it within ~200ms and returns the result (not hang forever) - Verify tests fail before implementing, then pass after
-
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (02.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.4 Integration with Test Runner Dispatch
File(s): compiler/oric/src/test/runner/mod.rs (lines 113-126), compiler/oric/src/commands/test.rs, test-all.sh
Wire the new subprocess orchestrator into the existing test runner dispatch, replacing the in-process run_file_llvm() path.
Hygiene note: runner/mod.rs is currently 572 lines — already over the 500-line limit. This subsection must NOT add net lines. The LLVM sequential comment block (lines 116-120) becomes dead code when the orchestrator handles LLVM dispatch, and should be removed. Target: net reduction of ~5 lines.
-
[BLOAT]
runner/mod.rs:116-120 — Remove the stale LLVM sequential execution comment block. It describes the old in-process approach that the orchestrator replaces. -
In
runner/mod.rs, modifyrun()(line 113-126) to intercept LLVM dispatch at therun()level. Whenconfig.backend == Backend::LLVMand!config.json, route to orchestrator:pub fn run(&self, path: &Path) -> TestSummary { let test_files = discover_tests_in(path); if self.config.backend == Backend::LLVM && !self.config.json { // Orchestrator mode: spawn worker subprocesses per file let summaries = llvm_worker::run_llvm_tests_isolated( &test_files, &self.config, &self.interner, ); let mut summary = TestSummary::new(); for file_summary in summaries { summary.add_file(file_summary); } summary } else if self.config.parallel && self.config.backend != Backend::LLVM { self.run_parallel(&test_files) } else { self.run_sequential(&test_files) } }When
config.json == true(worker mode), it falls through torun_sequential()->run_file_with_interner()->run_file_llvm()in-process. This is the worker’s execution path. -
Self-detection:
--jsonflag distinguishes worker from orchestrator.json == true= worker (in-process).json == false= orchestrator (spawn workers). -
Update
commands/test.rsoutput formatting to handleBackendCrashoutcomes:- In
print_file_results()(line 137): addTestOutcome::BackendCrash(msg)match arm with" CRASH: {name} - {msg}"marker - In
print_summary_stats()(line 159): addbackend_crashcount to thepartsvector (afterllvm_compile_fail, same pattern) - In
print_llvm_error_breakdown()(line 198): include crash count in the output
- In
-
CRITICAL: Revert the weakened test gate in
test-all.sh. With subprocess isolation, the parent process exits normally with code 0 or 1 — never killed by signal. TheORI_LLVM_CRASHEDescape hatch is unnecessary and must be removed. Specifically:-
test-all.shline 220-228: Inparse_ori_results(), remove theexit_code > 128crash branch that sets${prefix}_CRASHED=1. With isolation, the orchestrator always exits normally. Remove the_CRASHEDvariable entirely. -
test-all.shline 236: Removeeval "${prefix}_CRASHED=0"in the non-crash branch -
test-all.shline 458: Removeelif [ "${ORI_LLVM_CRASHED:-0}" -eq 1 ]display path (showsCRASHEDstatus) -
test-all.shlines 524-528: Removeelif [ "${ORI_LLVM_CRASHED:-0}" -eq 1 ]inemit_json()crash suite path -
test-all.shline 546: RemoveANY_CORE_FAILEDvariable — with isolation,ANY_FAILEDis the only check needed -
test-all.shlines 556-558: Removeelif [ "$ANY_CORE_FAILED" -eq 0 ] && [ "${ORI_LLVM_CRASHED:-0}" -eq 1 ]exit-0 escape hatch - Simplify final status to just
ANY_FAILEDcheck: exit 0 ifANY_FAILED == 0, exit 1 otherwise - Update
parse_ori_results()to parsebackend_crashcount: The new summary line fromprint_summary_stats()includesN backend crash— parse it alongsidepassed,failed,skipped,llvm compile fail - Satisfies mission criterion: “Weakened test gate reverted”
-
-
Backwards compatibility:
ori test --backend=llvm <file>without--jsonnow spawns a single worker. Slightly slower (subprocess overhead) but provides crash isolation. -
02.4.T — Integration tests:
-
test_orchestrator_directory_run— runori test --backend=llvm tests/spec/types/viaCommand::new, verify parent exits normally and pass counts > 0 -
test_orchestrator_survives_crash— runori test --backend=llvm tests/spec/viaCommand::new, verify exit code is 0 or 1 (NOT 139), and stdout containsBackendCrashif any files crash -
test_test_all_no_llvm_crashed_var—grep -c ORI_LLVM_CRASHED test-all.shreturns 0 (gate reversion verified at file level)
-
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (02.4) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.4 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.4: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.R Third Party Review Findings
- None.
02.N Completion Checklist
-
llvm_worker.rsmodule created withspawn_llvm_worker,collect_worker_result,extract_framed_json,run_llvm_tests_isolated - Crash detection works: SIGSEGV (signal 11) and SIGABRT (signal 6) produce
BackendCrash(viaExitStatus::signal()on Unix) - Non-Unix fallback:
detect_crashcompiles on non-Unix (returnsNone) - Timeout detection works: hanging workers killed after configurable timeout
- Worker pool bounds concurrency to CPU count (no fork-bomb)
-
--parallel-workers=Nand--no-parallelflags work -
--worker-timeout=Nflag works (default 60s, parsed inmain.rs) - Test runner dispatch routes LLVM tests through subprocess orchestrator
-
commands/test.rsoutput handlesBackendCrashoutcomes inprint_file_results,print_summary_stats,print_llvm_error_breakdown -
test-all.shparses the updated summary format correctly (includingbackend_crashcount) - Weakened test gate reverted:
ORI_LLVM_CRASHEDexit-0 escape hatch removed from test-all.sh —grep -c ORI_LLVM_CRASHED test-all.shreturns 0 -
ANY_CORE_FAILEDvariable removed from test-all.sh — onlyANY_FAILEDused - In-process path still works for
--jsonmode (worker serving the orchestrator) - TDD verified: all tests written before implementation
- Unit tests: 7 spawn/extract tests (02.1.T), 7 crash/timeout tests (02.2.T), 4 pool tests (02.3.T)
- Integration tests: 3 end-to-end tests (02.4.T)
- [BLOAT] runner/mod.rs: net zero or negative line change (remove stale LLVM comment block)
-
llvm_worker.rsunder 500 lines (extract pool toworker_pool.rssubmodule if needed) - Debug AND release builds pass:
timeout 150 cargo test(debug) andtimeout 150 cargo test --release(release) both succeed -
timeout 150 ./test-all.shpasses — LLVM backend no longer crashes the parent -
./clippy-all.shpasses - All 2098+ AOT tests pass (no regressions)
- Plan annotation cleanup:
bash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan 02returns 0 annotations - Plan sync — update plan metadata
-
/tpr-reviewpassed -
/impl-hygiene-reviewpassed -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: ori test --backend=llvm tests/spec/ completes without crashing the parent process. Workers that crash (SIGSEGV) produce BackendCrash outcomes that appear in the summary and cause exit code 1. ./test-all.sh reports the LLVM backend line with pass/fail/crash counts instead of CRASHED. The ORI_LLVM_CRASHED exit-0 escape hatch is removed from test-all.sh — crashes are real failures that block the gate. All AOT integration tests pass unchanged. Total wall-clock time for LLVM spec tests is within 2x of the current sequential in-process time.