Section 04: Late COW Compound Contraction
Status: Not Started
Goal: Add a late contraction pass that recognizes COW sequences (IsShared + branch + clone + mutate) in the ARC IR and contracts them into compound operations. This follows LLVM’s Expand/Optimize/Contract pipeline pattern — the AIMS pipeline “expands” into primitive IsShared/Set instructions, optimizes them, and now we “contract” back into efficient compound forms for the backend.
Context: LLVM’s ObjCARCContract.cpp fuses load + retain + release + store into a single objc_storeStrong call. The key insight: some instruction sequences implement a higher-level semantic operation, and the backend can handle the compound form more efficiently (single call, no branch prediction overhead, better register allocation). Ori’s COW protocol (IsShared check → branch → clone-if-shared → mutate) is exactly this pattern. The AIMS pipeline already computes CowMode in decide.rs with cross-dimensional proofs — this section extends that to the IR and backend layers.
Reference implementations:
- LLVM
ObjCARCContract.cpp:320-413:tryToContractReleaseIntoStoreStrong()— pattern-matchesload + retain + release + storeand replaces withobjc_storeStrong. Runs late (after main optimization) to avoid disrupting dataflow analysis. - LLVM pipeline:
ObjCARCExpand(canonicalize) →ObjCARCOpts(optimize) →ObjCARCContract(fuse). The contract pass runs after optimization because compound forms are harder to analyze.
Depends on: Section 01 (statistics to count contractions), Section 02 (selective barriers shape which RC ops surround COW diamond patterns; barrier changes must be in place before contraction pattern-matching is finalized).
04.1 Define Compound COW IR Instruction
File(s): compiler/ori_arc/src/ir/instr.rs (not mod.rs — ArcInstr is defined in instr.rs)
Add a compound COW instruction to ARC IR that represents the entire uniqueness-check + potential-clone + mutate sequence as a single instruction.
Scope warning: ArcInstr is matched in 107 files across ori_arc, ori_llvm, ori_repr, and oric. Adding a new variant requires updating every exhaustive match. Most non-test matches use _ => {} wildcard for unknown variants, but many key dispatch sites (defined_var(), used_vars(), uses_var(), is_owned_position(), substitute_var() in instr.rs, plus transfer functions, verify, emit_reuse, etc.) require explicit handling. Plan for ~30-50 match arm updates.
-
Add
CowMutatevariant toArcInstrinir/instr.rs:Import requirement:
CowModeis defined incrate::uniqueness::lattice(compiler/ori_arc/src/uniqueness/lattice.rs:87). Adduse crate::uniqueness::CowMode;toir/instr.rsimports.CowModecurrently derivesClone, Copy, Debug, PartialEq, Eq, Hash— sufficient forArcInstr’s non-cache derives. However,ArcInstralso has#[cfg_attr(feature = "cache", derive(serde::Serialize, serde::Deserialize))], soCowModeMUST also get this derive. Add it toCowMode’s definition inuniqueness/lattice.rs.Structural note:
CowMutateintroduces adstfield, unlikeSetwhich modifiesbasein-place without a separate destination. This meansCowMutateis both a definition (dst) AND a use (src,value). This mirrors the COW diamond semantics: the result may be the original or a clone./// Compound COW mutation — contracts IsShared+branch+clone+Set. /// /// Semantics: if `src` is uniquely referenced, mutate field in-place /// and bind `dst = src`. If shared, clone `src` into `dst`, then /// mutate the clone's field. /// The `cow_mode` annotation from AIMS determines the strategy: /// - `StaticUnique`: skip the check, mutate directly /// - `Dynamic`: emit the full check+branch+clone+mutate sequence /// - `StaticShared`: always clone (no check needed) CowMutate { dst: ArcVarId, src: ArcVarId, ty: Idx, // Note: Idx, not TypeId — ARC IR uses Idx for types field: u32, // Note: u32, not usize — consistent with Set/Project value: ArcVarId, cow_mode: CowMode, strategy: RcStrategy, }, -
Implement
defined_var()(Some(dst)),used_vars()([src, value]),uses_var(),is_owned_position(), andsubstitute_var()for the new variant ininstr.rs. -
Update all exhaustive match arms on
ArcInstr. The 107 files that referenceArcInstr::break into two categories:Category A: Exhaustive matches that MUST be updated (compiler error if missing):
ir/instr.rs— 5 methods:defined_var(),used_vars(),uses_var(),is_owned_position(),substitute_var()verify/mod.rs— verification rules (must add verification forCowMutate)aims/transfer/mod.rs— backward transfer functionsaims/emit_rc/— forward walk, dead cleanup, edge cleanup (multiple files)aims/emit_reuse/— reuse detectionaims/intraprocedural/— block analysisblock_merge/— merge, downgrade, selectliveness/mod.rs,drop/mod.rs,borrow/— various analysesoric/src/arc_dump/instr.rs— debug dump (must formatCowMutate)ori_llvm/src/codegen/arc_emitter/instr_dispatch.rs— LLVM emission (Section 04.3)ori_repr/— narrowing/range analysis
Category B: Wildcard matches (
_ =>) that compile but should be reviewed:- Most
ori_reprrange analysis files use_ => {}for unknown instructions — no update needed but review for correctness - Test files that construct specific instruction sequences — no update needed
Strategy for finding all sites:
- Add the
CowMutatevariant toArcInstr - Run
cargo c -p ori_arc— the compiler reports ALL exhaustive match errors inori_arc - Fix all
ori_arcmatches - Run
cargo c -p ori_llvm— reportsori_llvmmatches - Run
cargo c -p ori_repr— reportsori_reprmatches - Run
cargo c(full build) — catches any remaining - The Rust compiler is the exhaustiveness checker — no manual grep needed for non-wildcard matches
Conservative default for new arms: In analysis passes that don’t need special handling for
CowMutate, treat it the same asSet(mutation semantics). In RC emission, treat thesrcandvalueoperands as used,dstas defined. -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (04.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.2 Implement COW Contraction Pass
File(s): compiler/ori_arc/src/aims/emit_rc/cow_contract.rs (new file — placed in emit_rc/ alongside cow.rs, NOT in aims/contract/ which is for MemoryContract interprocedural summaries)
Pattern-match COW sequences and replace with CowMutate.
-
Define the COW sequence pattern to match:
Pattern spans 4 blocks (diamond CFG): Block A (head): body: [..., IsShared { dst: flag, var: src }] terminator: Branch { cond: flag, then_block: B_shared, else_block: B_unique } Block B_shared (shared path): body: [RcInc(src), Apply(clone_fn, src) → cloned, Set { base: cloned, ... }, ...] terminator: Jump { target: C_merge, args: [cloned] } Block B_unique (unique path): body: [Set { base: src, ... }] terminator: Jump { target: C_merge, args: [src] } Block C_merge (phi): params: [(result, ty)] ...Important:
Branchis anArcTerminator, not anArcInstr. The pattern spans basic blocks — this is a cross-block pattern match, not a single-block peephole. The contraction replaces all 4 blocks’ relevant parts with a singleCowMutateinstruction in the head block. -
Implement
contract_cow_sequences():/// Scan the CFG for COW diamond patterns and contract them into CowMutate. /// /// Returns the number of contractions performed. pub fn contract_cow_sequences(func: &mut ArcFunction) -> usize { // Walk blocks looking for: // 1. Block ending with Branch whose cond is defined by IsShared in same block // 2. Both successors have Set on the same field // 3. One successor also has a clone (shared path) // 4. Both successors Jump to the same merge block // Replace the diamond with a CowMutate in the head block. } -
Place in pipeline: after
realize_annotations()(step 10), before finalverify()(step 11). This is the “contract” phase — runs after all AIMS decisions are made.Pipeline interaction:
realize_annotations()computesCowAnnotations(a(block_idx, instr_idx) -> CowModemap) for eachSetinstruction at a COW site. The contraction pass consumes these annotations: it reads theCowModefrom the annotation map for eachSetthat participates in a COW diamond, embeds it into theCowMutateinstruction, and removes the annotation entry (since the CowMode now lives in the instruction). The LLVM emitter handlesCowMutatedirectly (reading CowMode from the instruction fields) rather than consulting the annotation map.Annotation index invalidation: Replacing a 4-block diamond with a single
CowMutateinstruction changes block indices and instruction indices. The contraction pass MUST updateCowAnnotationsindices for any remaining (non-contracted) annotations. Process contractions in reverse block order to avoid invalidating earlier indices.Prerequisite:
aims_pipeline.rssplitting (see overview Prerequisites table). The pipeline file is 590 lines — at the 500-line limit. Before adding contraction invocation, complete the prerequisite extraction task. If Section 03 or 05 is implemented first, they will have already done this. -
Track
cow_contractionsinSynergyMetrics. -
TPR checkpoint —
/tpr-reviewcovering 04.1–04.2 implementation work -
Unit tests (TDD: write BEFORE implementing
contract_cow_sequences()):- Simple COW diamond → contracts to
CowMutate - Non-COW diamond (different variables) → no contraction
StaticUniqueannotation →CowMutatewithStaticUniquemodeStaticShared→CowMutatewithStaticSharedmode- Multiple COW diamonds in one function → all contracted
- COW diamond with extra instructions in head block → correctly preserves non-COW instructions
- Negative pin: Diamond where the two paths mutate DIFFERENT fields → must NOT contract (different semantics)
- Simple COW diamond → contracts to
Semantic pin: Test that a COW mutation program produces identical output with and without contraction enabled. Use a flag (or the ORI_SKIP_ARC_OPTS env var from Section 06) to disable contraction and compare outputs.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (04.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.3 LLVM Emission for Compound COW
File(s): compiler/ori_llvm/src/codegen/arc_emitter/instr_dispatch.rs (instruction dispatch is in instr_dispatch.rs, not mod.rs)
Emit efficient LLVM IR for CowMutate instructions.
-
Add dispatch arm for
CowMutateininstr_dispatch.rs:The instruction dispatch file (
codegen/arc_emitter/instr_dispatch.rs) has the main match onArcInstrvariants. Add aArcInstr::CowMutate { .. } => self.emit_cow_mutate(..)arm. -
Implement
emit_cow_mutate()method onArcIrEmitter:Architecture note: The
Dynamicmode generates new LLVM basic blocks (unique_bb, shared_bb, merge_bb) with a phi node at the merge point. This is the same pattern used byIsShared+Branchin the existing emitter, but now encapsulated in a single instruction handler. Look at the existingIsSharedemission andBranchterminator emission for the exact LLVM builder API calls (build_conditional_branch,build_phi, etc.).fn emit_cow_mutate(&mut self, dst: ArcVarId, src: ArcVarId, ty: Idx, field: u32, value: ArcVarId, cow_mode: CowMode, strategy: RcStrategy) { match cow_mode { CowMode::StaticUnique => { // Direct in-place mutation — no check, no branch self.emit_set_field(src, ty, field, value); self.set_var(dst, self.get_var(src)); } CowMode::Dynamic => { // Full COW: check + branch + clone-if-shared + mutate let is_shared = self.emit_is_shared(src); // Create unique_bb, shared_bb, merge_bb // branch on is_shared → shared_bb / unique_bb // unique_bb: mutate in-place, br merge_bb // shared_bb: clone, mutate clone, br merge_bb // merge_bb: phi(src from unique, clone from shared) // set_var(dst, phi_result) } CowMode::StaticShared => { // Always clone — skip the check let cloned = self.emit_clone(src, ty, strategy); self.emit_set_field(cloned, ty, field, value); self.set_var(dst, cloned); } } }Note:
emit_set_field,emit_is_shared, andemit_cloneare not existing method names — they represent the LLVM operations that must be assembled from the emitter’s existing primitives (GEP, load, store, call toori_clone_*, etc.). Review the existingSetandIsSharedinstruction handlers ininstr_dispatch.rsfor the exact primitives. -
AOT tests in
ori_llvm/tests/aot/(all 3 CowMode variants must be covered):- COW mutation on uniquely-owned struct (
StaticUnique) → no clone, no branch - COW mutation on shared struct (
Dynamic) → check + clone-if-shared + mutate - COW mutation on always-shared struct (
StaticShared) → unconditional clone + mutate - COW mutation in loop → correct behavior across iterations (dynamic uniqueness may change per iteration)
ORI_CHECK_LEAKS=1on all AOT tests → zero leaks- Dual-exec parity: interpreter and LLVM produce identical results for all COW test programs (interpreter has no CowMutate — verifies semantic equivalence)
- Debug and release builds produce identical results
- COW mutation on uniquely-owned struct (
-
TPR checkpoint —
/tpr-reviewcovering 04.3 LLVM emission implementation -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (04.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.4 Runtime Intrinsic Evaluation
File(s): compiler/ori_rt/src/ (if needed)
Evaluate whether inlined LLVM IR or a runtime intrinsic is more efficient for compound COW. This is a mandatory profiling step, not an optional bonus.
-
Profile the emitted LLVM IR from 04.3 on benchmark programs:
- Measure branch prediction miss rate on COW
Dynamicmode (useperf stat) - Measure code size impact of inlined COW sequence vs a function call
- Compare against
ori_rt’s existing COW functions (ori_cow_mutate_list, etc.)
- Measure branch prediction miss rate on COW
-
Record the decision with measurements:
- If inlined IR is equal or better: close this subsection with a note documenting the measurements. No runtime intrinsic needed.
- If a runtime intrinsic is measurably better (>5% improvement on COW-heavy benchmarks): implement
ori_cow_mutate_field(data, field_offset, new_value, elem_size)inori_rtwithextern "C"ABI and update the LLVM emitter to call it forDynamicmode.
This subsection is a profiling gate, not deferred work. The deliverable is either (a) measurements proving inlined IR is sufficient, or (b) the runtime intrinsic implementation.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (04.4) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-04.4 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 04.4: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
04.R Third Party Review Findings
- None.
04.N Completion Checklist
-
CowMutatevariant added toArcInstrwith correctdefined_var()/used_vars() -
contract_cow_sequences()correctly pattern-matches COW diamonds - Contraction produces
CowMutatewith correctcow_mode -
ArcIrEmitter::emit_cow_mutate()generates correct LLVM IR for all 3 modes -
StaticUniquemode emits no branch — direct in-place mutation -
Dynamicmode emits check+branch+clone+mutate sequence -
StaticSharedmode emits unconditional clone+mutate -
cow_contractionstracked inSynergyMetrics - AOT tests verify correct behavior for all 3 modes
- Contraction-enabled and contraction-disabled produce identical program output
- Runtime intrinsic evaluation complete: profiling data recorded, decision documented (inline IR sufficient OR intrinsic implemented)
-
ORI_CHECK_LEAKS=1reports zero leaks on all test programs -
./test-all.shgreen (debug + release) - No spurious warnings in normal compilation
- Plan annotation cleanup:
bash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan 04returns 0 annotations - All intermediate TPR checkpoint findings resolved
-
/tpr-reviewpassed — independent Codex review found no critical or major issues -
/impl-hygiene-reviewpassed — hygiene review clean -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: ORI_LOG=ori_arc=info ori build shows cow_contractions > 0 on programs with COW mutations. All 3 CowMode variants emit correct LLVM IR verified by AOT tests. ORI_CHECK_LEAKS=1 reports zero leaks. Program behavior identical with and without contraction.