Section 03: Value Passing Optimization
Status: Not Started OPTIONAL — Profile-gated. Like Section 02, this optimizes the tree-walker which the VM replaces. Work this ONLY if profiling justifies it.
Goal: Eliminate avoidable Value::clone() work on the function call hot path. Current hot-path clones are a mix of cheap refcount bumps and a smaller number of real structure clones. Target: remove the expensive cases proven by profiling, without destabilizing semantics that the bytecode VM will replace anyway.
Context: The bind_parameters_with_defaults() function in compiler/ori_eval/src/exec/call/mod.rs:40-58 clones every argument via args[i].clone(). But the cost model is mixed:
- Heap-backed
Valuevariants already useArcviaHeap<T>, so many clones are shallow refcount bumps rather than deep copies FunctionValuecaptures are already shared viaArc<FxHashMap<_, _>>- The remaining potentially expensive clones are inline
FunctionValue/MemoizedFunctionValuepayload clones and repeated rebinding of captured values into fresh environments
Reference implementations:
- CPython:
Py_INCREFper parameter — refcount increment only, no deep clone - Lua: Arguments stay in caller’s stack frame; callee reads via offset — zero copy
Depends on: Section 01 (benchmarks). Independent of Section 02 (can be done in either order).
03.1 Parameter Binding Without Clone
File(s): compiler/ori_eval/src/exec/call/mod.rs
Current code (exec/call/mod.rs:40-58):
pub fn bind_parameters_with_defaults(
interpreter: &mut crate::Interpreter<'_>,
func: &FunctionValue,
args: &[Value],
) -> Result<(), ControlAction> {
let can_defaults = func.can_defaults();
for (i, param) in func.params.iter().enumerate() {
let value = if i < args.len() {
args[i].clone() // <-- Clone per parameter
} else if let Some(Some(can_id)) = can_defaults.get(i) {
interpreter.eval_can(*can_id)?
} else {
return Err(EvalError::new(...).into());
};
interpreter.env.define(*param, value, Mutability::Immutable);
}
Ok(())
}
Fix: Only move values instead of cloning when measurement shows clone work is still material after Section 02. Otherwise, preserve API stability and spend the effort on the bytecode path.
-
Benchmark clone share first: after Section 02, measure how much time remains in argument cloning versus environment setup and recursive dispatch. Do not do large API churn unless clone work is still material.
-
Only switch
eval_callto ownedVec<Value>if profiling justifies it:- Audit all call sites of
eval_callto verify args are not needed afterward - Account for memoization and any other callers that would immediately recreate clones, or the rewrite will simply move the cost around
- Audit all call sites of
-
If ownership rewrite is justified, move only ephemeral argument vectors: keep borrowed fast paths where the caller must retain the slice, and avoid forcing a global signature churn before the VM exists
-
Specialize for primitives: For
Value::Int,Value::Float,Value::Bool— these areCopy-sized. The enumclone()is already cheap, but ensure the compiler optimizes it to a memcpy. -
Matrix tests (in
compiler/ori_eval/src/exec/call/tests.rsand spec tests):- Dimensions: param count (0, 1, 2, 5) x param kind (primitive int/bool, collection [int], closure, struct) x default (no defaults, partial defaults, all defaults) x variadic (non-variadic, variadic)
- Semantic pin: function with 3 params where only 1 is provided (remaining 2 have defaults) produces correct result with move semantics
- Negative pin: double-use of moved arg produces compile error or runtime error (not silent corruption)
- TDD ordering: benchmark clone share FIRST (measure before changing), then write ownership tests if justified
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (03.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-03.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 03.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
03.2 Self-Binding and Capture Optimization
File(s): compiler/ori_eval/src/interpreter/function_call.rs, compiler/ori_eval/src/exec/call/mod.rs
Current self-binding (function_call.rs:51-52):
call_interpreter
.env
.define(self_name, func.clone(), Mutability::Immutable);
func.clone() on a Value::Function(FunctionValue) clones the FunctionValue struct which contains: params: Vec<Name> (clone = new Vec alloc), can_defaults: Vec<Option<CanId>> (clone = new Vec alloc), capabilities: Vec<Name> (clone = new Vec alloc), plus three cheap Arc bumps (captures: Arc<FxHashMap>, arena: SharedArena, canon: Option<SharedCanonResult>). This is a real cost for the Vec clones, but it must be measured before the representation is widened.
Fix: Store a reference or lightweight handle instead of cloning the entire FunctionValue.
-
If profiling shows self-binding is still hot, introduce a shared function handle for
Value::Function/Value::MemoizedFunctionso recursive self-binding is O(1) handle cloning instead of cloning inline metadata vectors. -
Keep closure semantics explicit: Ori closures currently capture by value from a frozen environment. Do not introduce lazy by-reference capture binding as an optimization without a separate semantic design, because it changes mutation and lifetime behavior.
-
Move capture optimization work toward the VM representation: for the tree-walker, shallow clones are acceptable if they are no longer dominant; for the VM, captures should become indexed closure storage rather than re-bound name maps.
-
Do not spend time on
ModeState::child(): it is an inline struct reset, not a heap allocation. Leave it alone unless a profiler shows otherwise. -
Matrix tests (in
compiler/ori_eval/src/interpreter/function_call.rstests and spec tests):- Dimensions: self-binding type (non-recursive, recursive, memoized-recursive) x capture count (0, 1, 5) x capture nesting (flat, closure-in-closure) x capability (no caps, 1 cap)
- Semantic pin: recursive function with memoization produces correct cached results after self-binding optimization
- Negative pin: mutable capture semantics unchanged (Ori captures by value — mutation of captured variable in closure does not affect outer scope)
- TDD ordering: profile self-binding cost FIRST, only proceed if material
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (03.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-03.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 03.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
03.R Third Party Review Findings
- None.
03.N Completion Checklist
- Any remaining hot-path
Value::clone()sites are justified by measurement - If self-binding is changed, the new representation is shared-handle based and regression-tested for recursion + memoization
-
cargo bench -p oric --bench interpreter -- closure_mapshows improvement - All spec tests pass:
cargo stgreen - All Rust tests pass:
cargo tgreen -
./test-all.shgreen - Plan annotation cleanup:
bash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan 03returns 0 annotations -
/tpr-reviewpassed -
/impl-hygiene-reviewpassed -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: Combined with Section 02, per-call cost ≤5µs (down from 63µs). Ackermann A(3,7) completes in <10s. Zero unnecessary clones on the hot path (verified by audit of function_call.rs and exec/call/mod.rs).