Section 01: LCFail Audit & Baseline
Status: Not Started Goal: Every LCFail test is categorized by root cause and mapped to a specific Section 21A subsection. The roadmap reflects accurate current numbers. A tracking mechanism exists to monitor progress toward LCFail=0.
Context: The roadmap’s Section 21A reports 1,985 LCFail tests, but the actual count is 3,956 — nearly double. This stale data makes it impossible to prioritize effectively. Before reprioritizing (Section 02), we need an accurate inventory of what’s failing and why.
Depends on: Nothing — this is the starting point.
01.1 Fix Stale Roadmap Numbers
File(s): plans/roadmap/section-21A-llvm.md
The “Current Test Results” table at line 72-76 of section-21A-llvm.md is wrong:
# CURRENT (STALE):
| Ori spec (evaluator) | 3035 | 0 | 42 | - | 3077 |
| Ori spec (LLVM backend) | 1082 | 1 | 9 | 1985 | 3077 |
| Rust unit tests (LLVM) | 527 | 0 | 15 | - | 542 |
-
Update the test results table with actual current numbers:
# CORRECTED (2026-03-25): | Ori spec (interpreter) | 4181 | 0 | 42 | - | 4223 | | Ori spec (LLVM backend) | 257 | 0 | 10 | 3956 | 4223 |Note: The evaluator test count grew from 3,035→4,181 (new spec tests added). The LLVM pass count dropped from 1,082→257 (likely due to changes in codegen that broke previously-passing files, or test reorganization).
-
Update ALL occurrences of the stale “2,472”
assert_eqcount in Section 21A. Actual count is 3,946 (verified 2026-03-25 viagrep -r "assert_eq" tests/spec/ | wc -l). Three locations to update:- Line 93:
"used in 2,472 test call sites"→"used in 3,946 test call sites" - Line 371:
"affects 2,472+ assert_eq call sites"→"affects 3,946+ assert_eq call sites" - Line 1084:
"blocks 2,472+ test call sites"→"blocks 3,946+ test call sites"
- Line 93:
-
Add a “Last Updated” annotation to the test results table so staleness is visible.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (01.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-01.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 01.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
01.2 LCFail Root Cause Categorization
File(s): New tracking table in this section (analysis output, not committed code)
LCFail is a file-level failure: when LLVM codegen fails on a module, ALL tests in that file become LCFail. The root cause is always a missing codegen feature. Categorization maps each failing file to its blocking feature.
-
Run the LLVM spec tests with verbose error output to capture the actual compilation error for each failing file:
./target/release/ori test --verbose --backend=llvm tests/ 2>&1 | tee lcfail-audit.log -
Parse
lcfail-audit.logto extract per-file error messages. The LLVM backend reports errors in two forms:"LLVM compilation failed: <msg>"— fromcompile_module_with_tests()returningErr(codegen error with structured message)"LLVM backend error: <msg>"— fromcatch_unwindcatching a panic (LLVM fatal error, assertion failure, etc.) Typical codegen error messages include:- Missing codegen for expression types
- Type lowering failures
- Function resolution failures (often generic functions skipped by
sig.is_generic()) - IR verification failures (from
fn_val.verify(true)inside compilation)
-
Categorize each failing file into one of these root cause buckets (mapped to 21A subsections):
Category 21A Subsection Typical Error Generic monomorphization 21.7 (Function Sequences & Expressions) sig.is_generic()skips indeclare_all/define_all— any file callingassert_eq<T>Sum type codegen 21.2 (Type Lowering) Orderingtype, preludecompare()return typeLambda/closure ABI 21.11 (Lambda & Closure Support) Function type parameters, closure captures Operator traits + impl blocks 21.4 (Operator Trait Dispatch) Struct methods, operator trait dispatch for user types Built-in functions 21.12 (Built-in Functions) Missing prelude function implementations Control flow 21.5 (Control Flow) for-yield, try, catch expressions Pattern matching 21.6 (Pattern Matching) match expressions, destructuring Collections/iterators 21.10 (Collections & Iterators) List/map/set operations, iterator trait Type lowering 21.2 (Type Lowering) Struct layouts, channel types, fixed-capacity lists Expression codegen 21.3 (Expression Codegen) Missing operators, conversions, string ops Other various Capabilities, FFI, concurrency -
For each category, count:
- Number of files blocked (primary metric — since LCFail is file-level)
- Number of test functions blocked (secondary metric — total LCFail impact)
- Whether fixing THIS feature alone would unblock the file, or if the file has multiple blocking features
-
Identify cascade files: files that fail due to multiple missing features. These won’t be unblocked by any single feature and will be the last to pass. Track them separately.
-
Record the categorization as a table:
| Category | Files Blocked | Tests Blocked | 21A Subsection | Cascade? | |----------|--------------|---------------|----------------|----------| | Generic monomorphization | ??? | ??? | 21.7 | Many files ALSO need this | | Sum types | ??? | ??? | 21.2 | ... | | ... | ... | ... | ... | ... |
Test Strategy
-
Matrix: This section is analysis, not code. Verification is by checking that every LCFail file appears in exactly one primary category.
-
Semantic pin: The categorization table itself is the deliverable. It becomes stale if not updated, so Section 02 creates a tracking mechanism.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (01.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-01.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 01.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
01.3 File-Level Failure Analysis
File(s): Analysis of compiler/oric/src/test/runner/llvm_backend.rs
Understanding HOW LCFail works is essential for tracking. Key facts from llvm_backend.rs:
- The JIT test runner (
run_file_llvm) compiles all functions in a file into a single JIT module viacompile_module_with_tests(). - Compilation is wrapped in
std::panic::catch_unwind. If compilation returnsErr(e)(codegen error) OR panics (LLVM fatal error), ALL tests in the file becomeTestOutcome::LlvmCompileFail(msg). - The error message is the codegen error’s
.messagefield (forErr) or the panic payload string (for panics). There is no directLLVMVerifyModulecall in the test runner — verification happens insidecompile_module_with_testsviafn_val.verify(true). - LCFail does NOT count as a test failure —
exit_code()returns 0 even with LCFail tests (they are filtered out ofhas_failures()).
-
Verify that the current LCFail mechanism correctly reports the blocking error per file (not just “compilation failed” but the specific LLVM error).
-
Check whether the current error messages are specific enough to categorize root causes. If error messages are too generic (e.g., just “LLVM compilation failed”), we may need to enhance the error reporting in
llvm_backend.rsto include which function/expression caused the failure. -
Document the LCFail tracking infrastructure:
- Where LCFail is defined:
compiler/oric/src/test/result/mod.rs:20(TestOutcome::LlvmCompileFail(String)) - Where LCFail is produced:
compiler/oric/src/test/runner/llvm_backend.rs(lines 272-314, inOk(Err(e))andErr(panic_info)arms ofcompile_resultmatch) - Where LCFail is counted:
FileSummary.llvm_compile_fail(line 121 ofresult/mod.rs),TestSummary.llvm_compile_fail(line 178 ofresult/mod.rs) - Where LCFail is displayed:
compiler/oric/src/commands/test.rs(lines 171-172 for test count, lines 179/201-202 for file count) - How LCFail interacts with exit codes:
TestSummary::exit_code()(line 226 ofresult/mod.rs) — LCFail does NOT cause exit code 1 (failure);has_failures()only checksfailed > 0 || error_files > 0. However, if the ONLY results are LCFail (total passed+failed+skipped == 0 AND error_files == 0),exit_code()returns 2 (“no tests found”) because thetotal() == 0guard fires first — note the guard also checksllvm_compile_fail == 0 && llvm_compile_fail_files == 0, so a fully-LCFail suite returns exit code 0, NOT 2. Only truly empty runs return 2.
- Where LCFail is defined:
-
Tracking mechanism decision: For automated LCFail tracking, choose one:
- (a) Parse existing verbose output (recommended for now): The
--verboseflag already shows per-file errors. A shell script can parse./target/release/ori test --verbose --backend=llvm tests/output to extract per-file error messages and counts. No code changes needed. - (b) Add
--lcfail-detailsflag (better long-term): Machine-readable JSON output of per-file LCFail errors. Requires changes to the test runner. Only pursue if (a) proves insufficient for Section 02’s milestone tracking.
Implement option (a) as a script in
scripts/lcfail-report.sh:#!/bin/bash # scripts/lcfail-report.sh — LCFail tracking and regression detection # Usage: ./scripts/lcfail-report.sh [--check] [--update-baseline] # # Default: prints current LCFail count and per-category breakdown # --check: compare against test-baselines/lcfail-count.txt, exit 1 if regression # --update-baseline: write current count to test-baselines/lcfail-count.txtThe script runs
./target/release/ori test --backend=llvm tests/(requires pre-built release binary), parses the summary line for the LCFail count, and optionally parses--verboseoutput for per-file error categories. Exit 0 for informational mode, exit 1 for--checkmode when count increased. Section 02.2 and Section 06.1 will use this script for tracking. - (a) Parse existing verbose output (recommended for now): The
Test Strategy
-
Matrix: N/A — this subsection is documentation of existing infrastructure.
-
TDD: For the
scripts/lcfail-report.shscript:- Script produces a count that matches the summary line’s LCFail number
- Script lists per-file error categories
- Script exits 0 (informational, not a test gate)
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (01.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-01.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 01.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
01.R Third Party Review Findings
- None.
01.4 Completion Checklist
- Section 21A test results table updated with correct numbers (4,181/257/3,956)
- Every LCFail file categorized into exactly one primary root cause bucket
- Categorization table includes file counts and test function counts per category
- Cascade files (multiple blockers) identified and tracked separately
- LCFail infrastructure documented (where defined, produced, counted, displayed)
-
scripts/lcfail-report.shimplemented and produces counts matching the test runner’s summary line -
./test-all.shgreen (no regressions from roadmap updates) -
/tpr-reviewpassed — independent Codex review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER/tpr-reviewis clean. -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: A complete categorization table exists mapping all ~188 LCFail files to their primary root cause bucket, with counts per category. The roadmap reflects accurate test numbers. The categorization data is sufficient for Section 02 to create an impact-ordered implementation sequence.