Section 05: Compiler Hot Path Optimization
Status: Not Started Goal: Reduce compilation overhead and non-AOT test execution time to contribute toward the 30s overall target. All optimizations are in compiler code (Rust), not test code. Optimizations are guided by Section 03’s profiling data — no speculative changes. The primary surfaces are: (a) test binary compilation time (sequential), (b) crate test execution for the slowest non-AOT crates, and (c) workspace-level parallelism configuration.
Context: The 59s cargo t wall time includes parallel execution of all crate test binaries. cargo test --workspace runs crate tests concurrently (up to --jobs parallelism). This means the ~23s “non-AOT” figure is NOT independent of the 35.6s AOT time — non-AOT crates run in parallel with AOT tests. The 59s wall time likely includes: (a) sequential compilation of all test binaries, (b) sequential cargo orchestration overhead, and (c) parallel test execution dominated by AOT. Section 03’s measurement methodology will separate compilation time from execution time (via cargo test --no-run vs cargo test), which is essential for understanding the actual optimization surface. The largest crates by test execution time are ori_eval (4.5s incl. compilation), ori_arc (3.4s), ori_patterns (2.6s), and ori_ir (2.5s).
Depends on: Section 03 (Profiling Infrastructure) — flamegraphs and per-crate timing identify which functions to optimize.
05.1 Crate Compilation Time
File(s): Workspace Cargo.toml, individual crate Cargo.toml files
Part of the cargo t wall time is spent COMPILING the test binaries, not running them. This is especially significant for large crates like ori_arc (~60K lines total, ~29.5K source + ~30.5K test code).
-
Measure compilation time separately from test execution time:
# Compilation only (no test execution): time cargo test --workspace --no-run # Test execution only (pre-compiled): time cargo test --workspace # Subtract compilation time = execution time -
If compilation is >10s of the total 59s:
-
Check codegen units: By default, Rust uses 16 codegen units for debug builds. No
[profile.test]or[profile.dev]sections exist inCargo.toml(verified 2026-03-25), so defaults apply. Consider adding[profile.dev] codegen-units = 256for faster compilation (at cost of slightly slower test execution). -
Check incremental compilation: No
incrementalsetting exists inCargo.tomlor.cargo/config.toml(verified 2026-03-25). Rust defaults to incremental=true for dev/test, which is correct. -
Check for heavy proc-macros: No proc-macro dependencies found in any compiler crate (verified 2026-03-25). This is not a compilation bottleneck.
-
Large crate split evaluation: ori_arc (~60K lines, 1,012 test functions across 38 test files) is the largest crate. Any change to ori_arc triggers recompilation of the entire crate + all tests. Evaluate whether splitting ori_arc into smaller crates would reduce incremental rebuild time. This is a trade-off analysis, not a recommendation — splitting adds dependency complexity.
-
-
If compilation is <10s: skip the above and focus on test execution optimization instead.
Test Strategy
-
TDD ordering:
Cargo.tomlprofile changes are configuration, not code. The test is that the entire suite continues working:- Before change: record
timeout 150 cargo tpass/fail counts - After change: verify identical pass/fail counts
- Before change: record
timeout 150 cargo bsuccess - After change: verify
timeout 150 cargo bstill succeeds
- Before change: record
-
Measurement: Record compilation time (
cargo test --workspace --no-run) vs execution time. Determine which dominates. -
Validation: Any
Cargo.tomlchanges don’t breakcargo torcargo b. -
Debug and release: Both
timeout 150 cargo tandtimeout 150 cargo b --releasemust pass after changes. -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.2 Test Execution Hot Paths
File(s): Specific compiler source files identified by Section 03’s flamegraphs
The flamegraphs from Section 03 will identify the top 10 hottest functions during test execution. This subsection addresses whatever those functions are.
This subsection has two parts: (a) tasks that can be started immediately (known crate analysis), and (b) tasks driven by Section 03’s flamegraph findings. Part (b) tasks are conditional on profiling results — they execute with whatever the top 10 functions turn out to be.
-
Pre-profiling: identify large test suites: Run
cargo test -p <crate> -- --list 2>/dev/null | wc -lfor each crate to count test functions. The crates with the most tests are likely where execution time improvements have the highest payoff. Prioritize: ori_arc (1,012 tests), ori_eval, ori_patterns. -
Pre-profiling: identify expensive test patterns: Search for test patterns that are inherently expensive:
# Tests that compile+run Ori programs (mini-pipelines within the test): grep -rn "compile_and_run\|run_source\|eval_source\|check_source" compiler/*/tests/ compiler/*/src/**/tests.rs # Tests that spawn subprocesses: grep -rn "Command::new" compiler/*/tests/ compiler/*/src/**/tests.rs # Tests with large inline source strings (>50 lines): grep -c ' "' compiler/*/tests/**/*.rs | sort -t: -k2 -rn | head -20Each of these patterns has different optimization approaches: mini-pipeline tests benefit from shared compilation state; subprocess tests benefit from batching; large-source tests benefit from fixture reuse.
-
Review flamegraph top 10: List the 10 hottest functions from Section 03’s analysis. For each:
- Read the function’s source code
- Understand why it’s hot (called frequently? expensive per-call? both?)
- Determine if optimization is possible without changing behavior
-
Common optimization patterns to look for:
-
Unnecessary cloning: Look for
.clone()in hot paths. Replace with borrows orCow<>where possible.# Find .clone() calls in hot crate source files grep -n "\.clone()" compiler/ori_arc/src/*.rs compiler/ori_types/src/**/*.rs -
Excessive allocation: Look for
Vec::new()orString::from()in tight loops. Replace with pre-allocated buffers or arena allocation. -
Hash map overhead: Look for
HashMap::get()orHashMap::insert()in hot paths. If keys are small integers, consider usingVecindexed by the integer (O(1) vs O(1) amortized but with lower constant). -
Missing
#[inline]: Cross-crate hot functions without#[inline]have call overhead. Check if top-10 hot functions cross crate boundaries. See memory note on parser#[inline]optimization (20-30% gain on cross-crate Index trait). -
Redundant computation: Look for the same value being computed multiple times. Salsa should handle this via memoization, but non-Salsa code may recompute.
-
Linear scans over hash lookups: For collections >8 items, linear scans are slower than hash lookups. Look for
.iter().find()or.iter().position()patterns.
-
-
For each identified optimization:
- Measure the function’s cost BEFORE (use
cargo benchif a benchmark exists, or a targeted timing test) - Implement the optimization
- Measure AFTER
- Verify no behavioral change (tests pass identically)
- Record the improvement
- Measure the function’s cost BEFORE (use
Test Strategy
-
TDD ordering: Hot path optimizations modify compiler internals. For each optimization:
- Record full test pass/fail output BEFORE the change
- Apply the optimization (clone reduction, allocation removal,
#[inline], etc.) - Verify pass/fail output is identical AFTER the change
- If the optimization changes a public API (unlikely but possible), write a targeted unit test first
-
Matrix: Every optimization must preserve all existing test behavior.
timeout 150 cargo tgreen after each change. Since hot path optimizations are not type- or pattern-dependent (they affect all types equally), the existing test suite IS the matrix — but verify both:timeout 150 cargo t(debug build, exercises more assertions)timeout 150 cargo t --release(release build, different optimization behavior)
-
Semantic pin: The existing test suite serves as the semantic pin for behavioral equivalence. Each optimization should also have a before/after timing measurement showing measurable improvement.
-
Measurement: Per-function timing before/after. Aggregate improvement on
cargo twall time. -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.3 Workspace-Level Optimization
File(s): test-all.sh, .cargo/config.toml, Cargo.toml
Optimizations at the workspace/build system level can reduce overhead that affects all crates.
-
Parallel test execution: Check if
cargo tis already running tests in parallel:# Default: Rust runs tests within each crate in parallel (threads) # Crate-level parallelism depends on cargo's job count cargo t -- --test-threads=N # N = number of parallel test threadsCurrent
cargo tuses default parallelism. Check if increasing or decreasing--test-threadsimproves wall time. -
Cargo parallel jobs: Check if the number of parallel compilation jobs is optimal:
# Check current setting grep "jobs" .cargo/config.toml # Default: number of CPU cores -
Profile-guided optimization (PGO) for test builds: Not recommended for test builds (PGO optimizes for specific workloads, but test builds need fast compilation, not fast execution). Document this decision.
-
LTO for test builds: Verify LTO is OFF for test builds (LTO is slow to compile). If LTO is accidentally on for the test profile, disable it.
-
Evaluate
cargo-nextest:cargo-nextestruns each test as an individual process (better isolation) and can be faster thancargo testfor large workspaces due to better parallelism. It is NOT currently installed (verified 2026-03-25).- Install:
cargo install cargo-nextest - Measure:
hyperfine 'cargo nextest run --workspace' 'cargo test --workspace' --warmup 1 --runs 3 - Decision: If nextest wall time is >10% faster, document and add as the default in
scripts/bench-tests.sh. If slower or equivalent, uninstall (cargo uninstall cargo-nextest) and document the comparison result. - Note:
cargo-nextestmay interact differently with AOT tests (each test already spawns 2 subprocesses; nextest adds another process layer). Profile to verify nextest doesn’t degrade AOT performance specifically.
- Install:
Test Strategy
-
TDD ordering: Workspace-level changes affect the build system, not compiler behavior. The “test” is the full suite:
- Before each change: record
timeout 150 cargo tpass/fail counts AND wall time - After each change: verify identical pass/fail counts AND record wall time delta
- Before each change: record
-
Validation: Any workspace-level change must not break
cargo t,cargo b, or./test-all.sh. -
Debug and release:
timeout 150 cargo tandtimeout 150 cargo b --releasemust both pass. -
Measurement: Wall time before/after each change, measured with
hyperfinefor statistical validity (3+ runs). -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.R Third Party Review Findings
- None.
05.4 Completion Checklist
- Compilation time vs execution time breakdown recorded
- Flamegraph top 10 hot functions analyzed and documented
- Each identified optimization measured (before/after per function)
- Workspace-level optimizations evaluated (parallel threads, nextest, etc.)
- Compilation time measured separately (cargo test —no-run): ???s
- Non-AOT crate execution times measured individually (per-crate): documented
- All tests pass identically (no behavioral changes)
- Optimizations documented with measured impact
-
timeout 150 cargo tgreen -
/tpr-reviewpassed — independent Codex review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER/tpr-reviewis clean. -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: Each identified optimization is individually measured (before/after) and documented. No test code was modified. The flamegraph’s top 10 hot functions have been addressed (either optimized or documented as “acceptable — not optimizable without behavioral changes”). If Section 03’s profiling reveals that non-AOT crates fully overlap with AOT execution (i.e., they finish before AOT does), the optimization focus shifts to compilation time reduction, which benefits wall time by reducing the sequential pre-test build phase.