Section 06: Verification
Status: Not Started Goal: All parser frontend benchmarks show measurable improvement compared to Section 01 baselines. All test suites green. Findings documented for future reference.
Context: This section re-measures all benchmarks established in Section 01 after all optimization work (Sections 02-05) is complete. The before/after comparison proves the plan delivered measurable value and identifies which optimizations had the most impact.
Depends on: All prior sections (01-05).
06.1 Benchmark Comparison
File(s): plans/parser-perf/baselines.json
Re-run the complete benchmark suite and compare against Section 01 baselines.
-
Run all benchmarks with
--baseline parser-perf-v0to compare against saved baselines:cargo bench -p oric --bench lexer_core -- --baseline parser-perf-v0 cargo bench -p oric --bench lexer -- --baseline parser-perf-v0 cargo bench -p oric --bench parser -- --baseline parser-perf-v0 -
Record final metrics in
baselines.jsonalongside original baselines:{ "final_results": { "date": "YYYY-MM-DD", "commit": "<hash>", "lexer_core_raw_throughput_mib_s": null, "lexer_cooked_throughput_mib_s": null, "parser_raw_throughput_mib_s": null, "parser_salsa_throughput_mib_s": null, "salsa_overhead_percent": null }, "improvement": { "lexer_cooked_percent": null, "parser_raw_percent": null, "parser_salsa_percent": null, "salsa_overhead_delta": null } } -
Verify new workloads from Section 01.2 are included in the final run:
parser/salsa_overhead/rawandparser/salsa_overhead/via_salsa(Salsa isolation)- String-heavy workload (in
lexer.rs) - Expression-heavy workload (in
parser.rs)
-
Identify which optimizations had the most impact:
- Cross-crate inlining (Sections 03.1, 04.1)
- Arena pre-allocation (Sections 03.2, 04.2)
- Cooker/parser micro-optimizations (Sections 03.3, 04.4)
- Incremental parsing (Section 05.2)
-
Verify no benchmark regressions: every benchmark must be equal or better than baseline.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.2 Regression Testing
Run the complete test suite to verify zero behavioral regressions.
-
timeout 150 ./test-all.sh— all tests pass -
./clippy-all.sh— no warnings -
./fmt-all.sh— no formatting changes needed -
timeout 150 cargo st— all Ori spec tests pass -
timeout 150 cargo t -p ori_lexer— all lexer tests pass -
timeout 150 cargo t -p ori_parse— all parser tests pass -
timeout 150 cargo t -p oric— all Salsa integration tests pass -
Verify debug AND release builds:
timeout 150 cargo t cargo b --release # build first (no timeout on compilation) timeout 150 cargo t --release # test execution only, respects mandatory 150s timeout -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.3 Documentation
Update project documentation to reflect changes and record findings.
-
Update CLAUDE.md
Parser Performancememory section with new baseline numbers. -
Update memory file
/home/eric/.claude/projects/-home-eric-projects-ori-lang/memory/MEMORY.mdwith new benchmark results. -
If file splits changed crate structure, update
.claude/rules/parse.mdkey files section. -
If incremental parsing was activated, document the
ParseCache/parse_cache()pattern in.claude/rules/compiler.md. -
If incremental parsing was activated, document
compute_text_change()in.claude/rules/parse.mdincremental section. -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (06.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-06.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 06.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
06.R Third Party Review Findings
- None.
06.N Completion Checklist
- All benchmarks re-measured against Section 01 baselines
- Improvement percentages recorded in
baselines.json - No benchmark regressions detected
-
./test-all.shgreen -
./clippy-all.shgreen -
./fmt-all.shclean - Debug and release builds pass all tests
- CLAUDE.md and memory updated with new baselines
- Plan documentation updated
-
/tpr-reviewpassed — independent Codex review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER/tpr-reviewis clean. -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: baselines.json contains populated before/after results showing measurable improvement. All test suites green in both debug and release modes. Documentation updated. parser_raw_throughput_mib_s final > parser_raw_throughput_mib_s baseline.