Section 02: File Hygiene
Status: Not Started
Goal: All ori_parse source files are under 500 lines. lib.rs (1,326L), copier.rs (1,595L), kind.rs (1,041L), cursor.rs (665L), and outcome.rs (697L) are split into focused submodules. All existing tests pass unchanged — purely structural refactoring with zero behavioral changes.
Context: Per .claude/rules/impl-hygiene.md, source files (excluding tests) must be under 500 lines. Five files in ori_parse exceed this limit, with lib.rs and copier.rs being critical blockers for performance work in Section 04. lib.rs mixes parser struct definition, context management, parsing entry points, module parsing, and error collection into a single file. copier.rs is a LEAK violation — manual AST node copying that should be centralized.
Splitting these files before performance work ensures that optimization patches touch focused files, not monoliths. It also makes code review tractable.
Reference implementations:
- Ori
.claude/rules/impl-hygiene.md: “500-line limit (excl. tests); exceeding = BLOAT finding. Proactive split at ~450 lines.”
Depends on: None.
02.1 Split lib.rs (1,326L → ~4 files)
File(s): compiler/ori_parse/src/lib.rs
lib.rs currently contains: KnownNames struct + constructor (~20L), Parser struct + constructor + capacity helpers (~60L), context management (~60L), error context helpers (~60L), token capture (~40L), speculative parsing/snapshots (~80L), parse_module() (~90L), parse_imports() (~100L), dispatch_declaration() (~190L), handle_declaration_error + recovery methods (~60L), parse_module_incremental() bridge (~50L), ParseOutput impl (~30L), free functions parse/parse_with_metadata/parse_incremental (~80L), and post-parse analysis (~60L).
Per hygiene rules, lib.rs should be an index: //! doc, mod declarations, pub use re-exports — no function bodies.
-
Extract
Parserstruct definition, constructor, and core methods toparser/mod.rs:Parser<'a>struct withcursor,arena,context,known,deferred_errors,deferred_warningsParser::new()constructor,estimated_source_len(),take_arena()(test-only)- Context management methods (
with_context,without_context,has_context,allows_struct_lit) - Error context methods (
in_error_context,in_error_context_result) - Token capture methods (
with_capture,capture_if) - Speculative parsing methods (
snapshot,restore,try_parse,look_ahead) - Utility methods (
check_one_of,expect_one_of)
-
Extract module-level parsing to
parser/module_parse.rs:Parser::parse_module()(~90L)Parser::parse_imports()(~100L)Parser::dispatch_declaration()(~190L)Parser::handle_declaration_error(), recovery methodsParser::parse_module_incremental()bridge (~50L)handle_outcome()helper
-
Extract
KnownNamespre-interning toknown_names.rs:KnownNamesstruct with pre-interned contextual keywordNamefieldsKnownNames::new(interner)constructor
-
Item-level parsing dispatch already exists in
grammar/item/mod.rswith submodules — no extraction needed. Verifydispatch_declaration()delegates to these correctly. -
Wire up new submodules: add
mod parser;andmod known_names;tolib.rs. Update all internalusepaths withinori_parsethat referenceParser,KnownNames, or any moved methods. Downstream crates (oric, benchmarks) should not be affected since they only import the public API (parse(),ParseOutput, etc.) which will be re-exported fromlib.rs. -
Reduce
lib.rsto an index://!module docmoddeclarations for all submodulespub usere-exports forparse(),parse_with_metadata(),parse_incremental(),ParseOutput,ParseError,ParseOutcome,ParseContext,ParseWarning- Free functions
parse(),parse_with_metadata(),parse_incremental(),find_token_end_before()move toparser/submodule (they are thin wrappers aroundParser::parse_module()) FunctionOrTestenum,ParsedAttrsre-export move toparser/orgrammar/- Target: < 80 lines
-
Verify all tests pass in debug:
timeout 150 cargo t -p ori_parse -
Verify all spec tests pass:
timeout 150 cargo st -
Verify all tests pass in release:
cargo b --release && timeout 150 cargo t -p ori_parse --release
TDD ordering: Write no new tests (purely structural). Existing tests serve as the regression guard. Run the full suite BEFORE any split to establish a green baseline, then after each split to verify zero regressions. Debug AND release must pass.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (02.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.2 Split copier.rs (1,595L → ~3 files)
File(s): compiler/ori_parse/src/incremental/copier.rs
The incremental/ directory already contains: mod.rs (dispatch), copier.rs (1,595L), cursor.rs (navigation), decl.rs (declaration collection/categorization), and tests.rs. The existing decl.rs handles declaration discovery and categorization — NOT copying. The proposed copy_decl.rs handles declaration-level deep-copy logic (distinct from decl.rs).
copier.rs contains a monolithic deep_copy_expr() function that matches on every ExprKind variant to clone AST nodes with span adjustment for incremental parsing. This is a LEAK violation — manual variant matching that should be centralized or auto-generated.
-
Extract expression copying to
incremental/copy_expr.rs:deep_copy_expr()function and its helpers- Variant-specific copy logic for
ExprKindvariants
-
Extract declaration copying to
incremental/copy_decl.rs:- Declaration-level deep-copy logic (functions, types, traits, impls)
- Note: this is DISTINCT from existing
decl.rswhich handles declaration discovery/categorization, not copying
-
Reduce
copier.rsto a dispatch hub:- Public
copy_module()API - Dispatch to
copy_exprandcopy_declsubmodules - Target: < 200 lines
- Public
-
Verify incremental tests pass:
timeout 150 cargo t -p ori_parse -- incremental
TDD ordering: Existing incremental tests (incremental/tests.rs, 547 lines) serve as the regression guard.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (02.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.3 Split Remaining Oversized Files
File(s): compiler/ori_parse/src/error/kind.rs (1,041L), compiler/ori_parse/src/cursor/mod.rs (665L), compiler/ori_parse/src/outcome/mod.rs (697L), compiler/ori_parse/src/grammar/expr/patterns/match_patterns.rs (595L), compiler/ori_parse/src/grammar/item/function/mod.rs (532L)
These files are over 500 lines but less critical than lib.rs and copier.rs. Split them to stay within hygiene limits.
-
error/kind.rs(1,041L): Extract context enums (ErrorContext,ParseExpected, etc.) toerror/context_kinds.rs. KeepParseErrorKindenum inkind.rs. Target: both files < 500L.- While splitting, remove all decorative
// ===banners (e.g.,// === Token-level errors ===,// === Expression errors ===, etc. — ~13 instances). Replace with plain// Section namecomments per hygiene rules.
- While splitting, remove all decorative
-
cursor/mod.rs(665L): Extract error-building methods (make_expect_error,make_expect_ident_error,make_expect_ident_or_keyword_error) tocursor/errors.rs. Keep navigation and token access methods inmod.rs. Target:mod.rs< 500L. -
outcome/mod.rs(697L): Extract synchronization/recovery helper methods tooutcome/recovery_helpers.rs. KeepParseOutcome<T>enum and coreFrom/conversion impls inmod.rs. Target:mod.rs< 500L.- While splitting, remove all decorative
// ===banners (// === Constructors ===,// === Predicates ===,// === Transformations ===,// === Conversions ===,// === Backtracking Macros ===— 5 instances at lines 104, 142, 188, 401, 409). Replace with plain// Section namecomments per hygiene rules.
- While splitting, remove all decorative
-
grammar/expr/patterns/match_patterns.rs(595L): Extract complex match arm parsing helpers togrammar/expr/patterns/match_helpers.rs. Keep the primaryparse_match_armandparse_match_patterndispatch inmatch_patterns.rs. Target: both files < 500L. -
grammar/item/function/mod.rs(532L): Extract function clause parsing or contract parsing to a sibling submodule (e.g.,function/clauses.rsorfunction/contracts.rs). Target:mod.rs< 500L. -
Verify all tests pass after each split:
timeout 150 cargo t -p ori_parse
Codebase scan findings — files touched by later sections that also exceed 500 lines:
The following files are modified by Sections 03-05 and also exceed the 500-line limit. They are NOT in ori_parse and thus not part of this section’s primary scope, but the plan should not touch bloated files without at least noting the issue. These are recorded here for awareness; splitting them is optional in this plan but MANDATORY if the plan’s changes would push them further over the limit.
compiler/ori_ir/src/arena/mod.rs(530L, ~528L production —#[cfg(test)] mod tests;at L529): Touched by Sections 03.1 and 04.1/04.2. 28 lines over limit. If adding code, split first.compiler/ori_ir/src/ast/expr.rs(510L, all production): Touched by Section 04.1 (Expr::newinline). 10 lines over limit. If adding code, split first.compiler/oric/src/query/mod.rs(582L, ~580L production —#[cfg(test)] mod tests;at L78): Touched by Section 05. 80 lines over limit. If adding incremental parsing logic, split first (e.g., extractcanonicalize_cached()and related functions toquery/canonicalize.rs).
TDD ordering: Existing tests in error/tests.rs (899L), cursor/tests.rs (358L), and outcome/tests.rs (395L) serve as regression guards.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (02.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.R Third Party Review Findings
- None.
02.N Completion Checklist
-
lib.rsreduced to < 80 lines (index only) -
copier.rssplit into 3 files, each < 500 lines -
error/kind.rssplit, both parts < 500 lines -
cursor/mod.rsreduced to < 500 lines -
outcome/mod.rsreduced to < 500 lines -
grammar/expr/patterns/match_patterns.rsreduced to < 500 lines -
grammar/item/function/mod.rsreduced to < 500 lines - No file in
ori_parse/src/(excluding tests) exceeds 500 lines - All decorative
// ===banners removed fromoutcome/mod.rs(5 instances) anderror/kind.rs(~13 instances), replaced with plain// Section namecomments - All
ori_parsetests pass:timeout 150 cargo t -p ori_parse - All spec tests pass:
timeout 150 cargo st -
./test-all.shgreen -
./clippy-all.shgreen - Debug AND release builds pass:
timeout 150 cargo t -p ori_parse --release - Zero net behavioral changes (diffs are purely
mod/useadditions) -
/tpr-reviewpassed — independent Codex review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER/tpr-reviewis clean. -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: find compiler/ori_parse/src -name '*.rs' ! -path '*/tests*' ! -name 'tests.rs' | xargs wc -l | sort -rn | head -10 shows no file exceeding 500 lines. All test suites green.