Section 05: PRE-Style Global RC Code Motion
Status: Not Started
Goal: Add a bidirectional dataflow pass that moves RcInc/RcDec operations across basic block boundaries to find optimal placement points, eliminating partially redundant RC operations. Uses LLVM’s bottom-up/top-down pattern with path-counting for CFG hazard detection and Swift’s region-awareness for loops.
Context: Sections 02-03 optimize RC operations within basic blocks (selective barriers, nested pair elimination). This section extends to cross-block optimization. LLVM’s ARC optimizer performs a bidirectional dataflow: bottom-up from releases to find matching retains, top-down from retains to find matching releases. Both directions must agree for a pair to be eliminated. Path counting ensures operations aren’t moved through loop back-edges (which would change execution count). This is the most complex section — it implements a new analysis pass with bidirectional dataflow, CFG hazard checking, and safe code motion.
Key design principle (from Codex): “pair safe” (KnownSafe — can this pair be eliminated?) is separate from “motion safe” (CFGHazardAfflicted — can we move the ops to do it?). LLVM encodes this as two independent flags on PtrState. A pair is eliminated only when BOTH are satisfied.
Reference implementations:
- LLVM
ObjCARCOpts.cpp:1394-1637:VisitBottomUp()andVisitTopDown()— two-pass bidirectional dataflow with per-block state merging. - LLVM
ObjCARCOpts.cpp:177-325:BBStatewithTopDownPathCount/BottomUpPathCount— path counting for CFG hazard detection. Overflow → conservative. - LLVM
ObjCARCOpts.cpp:1812-1960:PairUpRetainsAndReleases()— matching algorithm withReverseInsertPtsfor code motion. - Swift
GlobalARCSequenceDataflow.h:33-113: Region-based analysis with loop summarization.ARCRegionStatetracks per-region state.
Depends on: Section 01 (statistics), Section 02 (barrier information), Section 03 (KnownSafe flags).
Recommended subsection implementation order: 05.1 → 05.5 → 05.2 → 05.3 → 05.4 → 05.6. Rationale: 05.5 (CFG hazard detection / path counting) must be available before 05.2 and 05.3 can populate the cfg_hazard flag, and 05.4 consumes that flag. Alternatively, integrate path-count logic directly into the BU/TD traversals (05.2/05.3), which eliminates 05.5 as a separate step.
Data flow from dependencies:
- Section 01:
SynergyMetricsfields for trackingrc_pairs_eliminated/rc_ops_moved. - Section 02:
callee_may_observe_rc()function fromcoalesce/mod.rs, andMemoryContractmap fromconfig.contractsinrun_aims_pipeline(). - Section 03:
KnownSafeAnalysisresults — the RC motion pass must receive theKnownSafeAnalysisstruct computed in Section 03 (stored as a local inrun_aims_pipeline()between the Section 03 pass and this pass). TheVarRcInfo::known_safeflag is read during pair matching (05.4).
05.1 Define Per-Block RC State Model
File(s): compiler/ori_arc/src/aims/rc_motion/mod.rs (new module)
Define the state tracked per basic block and per variable during bidirectional traversal.
-
Create
compiler/ori_arc/src/aims/rc_motion/directory -
Add
pub mod rc_motion;tocompiler/ori_arc/src/aims/mod.rs(alongside the existingpub mod knownsafe;from Section 03) -
Define
RcSequence(analogous to LLVM’sSequenceenum):/// State of an RC operation sequence for a single variable. #[derive(Clone, Copy, Debug, PartialEq, Eq)] pub enum RcSequence { /// No RC operation seen yet. None, /// RcInc seen (bottom-up: potential release match point). Inc, /// An instruction that might alter the variable's refcount was seen. CanRelease, /// A use of the variable was seen (requires positive refcount). Use, /// Sequence is stopped (can't optimize further). Stop, /// A movable RcDec was seen (can potentially be moved). MovableDec, } -
Define
VarMotionState:/// Per-variable state for RC code motion analysis. #[derive(Clone, Debug)] pub struct VarMotionState { /// Current position in the retain/release sequence. pub seq: RcSequence, /// KnownSafe from Section 03 — refcount known positive. pub known_safe: bool, /// CFG hazard — motion is unsafe across this point. pub cfg_hazard: bool, /// Insertion points for the reverse direction's operations. pub reverse_insert_pts: SmallVec<[ArcBlockId; 2]>, /// The RC instructions that form this pair. pub pair_insts: SmallVec<[(ArcBlockId, usize); 2]>, } -
Define
BlockMotionState(analogous to LLVM’sBBState):pub struct BlockMotionState { /// Top-down path count (from function entry to this block). pub td_path_count: u32, /// Bottom-up path count (from this block to function exits). pub bu_path_count: u32, /// Per-variable top-down state. pub top_down: FxHashMap<ArcVarId, VarMotionState>, /// Per-variable bottom-up state. pub bottom_up: FxHashMap<ArcVarId, VarMotionState>, } -
Implement merge operations:
init_from_pred()/merge_pred()for top-down (analogous to LLVMBBState::InitFromPred/MergePred)init_from_succ()/merge_succ()for bottom-up- Path count overflow detection (set to
OVERFLOW_VALUE = 0xFFFF_FFFF)
-
Unit tests for state model and merging.
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.2 Implement Bottom-Up Pass
File(s): compiler/ori_arc/src/aims/rc_motion/bottom_up.rs (new file)
Walk the CFG in reverse post-order, bottom-up, tracking release sequences.
Reuse existing infrastructure: compiler/ori_arc/src/graph/mod.rs provides compute_postorder() (post-order traversal), compute_predecessors(), and dominator tree construction. Reverse the postorder result to get reverse post-order (RPO). Use these rather than reimplementing CFG traversal.
-
Implement
visit_bottom_up():/// Bottom-up pass: walk from exits to entry, tracking RcDec sequences. /// /// For each block (in reverse post-order): /// 1. Merge bottom-up state from all successors /// 2. Walk instructions backward /// 3. When RcDec seen: init sequence /// 4. When RcInc seen: try to match with existing sequence → record pair /// 5. When use/call seen: update sequence state (Use, CanRelease, Stop) pub fn visit_bottom_up( func: &ArcFunction, states: &mut FxHashMap<ArcBlockId, BlockMotionState>, contracts: &FxHashMap<Name, MemoryContract>, ) -> Vec<RcPair> -
Handle instruction effects using Section 02’s barrier logic:
Apply/ApplyIndirect: consultMemoryContractfor selective barrierIsShared: use of variable (must have positive refcount)Project: use of source variableSet: use + potential refcount observation
-
Bail out if per-block state grows too large (LLVM’s
MaxPtrStates = 4095). -
Unit tests (TDD: write BEFORE implementing
visit_bottom_up()— tests must fail initially): linear block, diamond CFG, loop (verifies back-edge handling), irreducible CFG (verifies conservative behavior). -
TPR checkpoint —
/tpr-reviewcovering 05.1–05.2 implementation work -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.3 Implement Top-Down Pass
File(s): compiler/ori_arc/src/aims/rc_motion/top_down.rs (new file)
Walk the CFG in post-order, top-down, tracking retain sequences.
-
Implement
visit_top_down():/// Top-down pass: walk from entry to exits, tracking RcInc sequences. /// /// Mirror of bottom-up: when RcInc seen, init sequence. /// When RcDec seen, try to match. Use/call updates state. pub fn visit_top_down( func: &ArcFunction, states: &mut FxHashMap<ArcBlockId, BlockMotionState>, contracts: &FxHashMap<Name, MemoryContract>, ) -> Vec<RcPair> -
Use same barrier logic as bottom-up (Section 02 integration).
-
Unit tests (TDD: write BEFORE implementing
visit_top_down()— mirror of bottom-up tests). -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.4 Pair Up and Place
File(s): compiler/ori_arc/src/aims/rc_motion/placement.rs (new file)
Match bottom-up pairs with top-down pairs and compute optimal placement.
Execution order dependency: This subsection consumes cfg_hazard flags from VarMotionState (condition 5 below). These flags must be set by 05.5’s check_cfg_hazards() BEFORE pair_up_and_place() runs. Implementation order: 05.1 → 05.5 → 05.2 → 05.3 → 05.4 → 05.6. Alternatively, integrate path-count computation into the BU/TD passes (05.2/05.3) directly, which is what LLVM does — path counts are accumulated during the dataflow traversal, not in a separate pass.
-
Implement
pair_up_and_place():/// Match retain/release pairs from both passes and compute placement. /// /// A pair is eliminable when: /// 1. Bottom-up found a release matching a retain (same variable) /// 2. Top-down found a retain matching a release (same variable) /// 3. Both agree on the pair /// 4. KnownSafe is true (from Section 03) OR both paths are safe /// 5. CFGHazardAfflicted is false (path count check passes) /// /// For eliminable pairs: remove both instructions. /// For movable-but-not-eliminable: compute insertion points and move. pub fn pair_up_and_place( func: &mut ArcFunction, bu_pairs: &[RcPair], td_pairs: &[RcPair], states: &FxHashMap<ArcBlockId, BlockMotionState>, ) -> usize // pairs eliminated or moved -
Movement rules:
- Inc can move down toward its matching Dec (delay retain)
- Dec can move up toward its matching Inc (early release)
- Neither can move through a loop back-edge (path count mismatch)
- Neither can move past a use that requires positive refcount
-
Unit tests (TDD: write BEFORE implementing
pair_up_and_place()): elimination in diamond, movement in triangle, loop preservation (back-edge blocks motion). -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.4) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.4 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.4: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.5 CFG Hazard Detection
File(s): compiler/ori_arc/src/aims/rc_motion/hazards.rs (new file)
Implement path-count-based CFG hazard detection (from LLVM’s BBState).
-
Implement
check_cfg_hazards():/// Check for CFG structures where moving RC ops would change execution count. /// /// Uses path counting: TopDownPathCount × BottomUpPathCount gives the /// number of unique paths through a block. If moving an RC op would /// cause it to execute on more paths than intended, the motion is unsafe. /// /// Overflow handling: if path count exceeds u32::MAX, fall back to /// conservative (no motion for that block). pub fn check_cfg_hazards( func: &ArcFunction, states: &mut FxHashMap<ArcBlockId, BlockMotionState>, ) -
Compute path counts:
- Entry block:
td_path_count = 1 - Exit blocks:
bu_path_count = 1 - Other blocks: sum of predecessor/successor path counts
- Overflow → mark as
OVERFLOWand clear all states for that block
- Entry block:
-
Mark
cfg_hazard = trueonVarMotionStatewhen path counts indicate motion is unsafe:- Loop back-edge detected (successor has lower RPO than current block)
- Path count product overflows
- Not all successors have same sequence state (asymmetric CFG)
-
Unit tests:
- Linear CFG → no hazards
- Diamond CFG → no hazards
- Loop → hazard on back-edge
- Irreducible CFG → conservative (all hazards)
-
TPR checkpoint —
/tpr-reviewcovering 05.3–05.5 implementation work -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.5) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.5 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.5: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.6 Integration and Matrix Testing
File(s): compiler/ori_arc/src/pipeline/aims_pipeline.rs, tests
Integrate the RC motion pass into the AIMS pipeline.
-
Insert RC motion pass into
run_aims_pipeline():- After KnownSafe pass (Section 03) and before
verify_and_merge()which contains steps 6-9 (verify, AIMS-verify, tail calls, unwind cleanup, merge_blocks) - This ensures RC motion operates on the pre-merge IR with valid block structure
- RC motion modifies RC instructions (removes/moves
RcInc/RcDec), soverify()(step 6 insideverify_and_merge) will validate the modified IR. No separate re-verification needed sinceverify_and_mergeruns after RC motion.
Data threading: The RC motion pass needs:
&ArcFunction— the post-emission IRcontracts: &FxHashMap<Name, MemoryContract>— fromconfig.contracts(forcallee_may_observe_rc())analysis: &KnownSafeAnalysis— from Section 03’s pass (stored as a local between the KnownSafe and RC motion invocations inrun_aims_pipeline())metrics: &mut SynergyMetrics— to recordrc_pairs_eliminatedandrc_ops_moved
Prerequisite:
aims_pipeline.rssplitting (see overview Prerequisites table). The pipeline file is 590 lines — at the 500-line limit. This extraction must be done before adding RC motion invocation. If Section 03 or 04 is implemented first, they will have already completed this. - After KnownSafe pass (Section 03) and before
-
Add
rc_pairs_eliminatedandrc_ops_movedtoSynergyMetrics -
Matrix testing:
- Type dimension: str, [int], Option<str>, closures, maps, nested structs
- CFG dimension: linear, diamond, triangle, loop (while), loop (for), nested loop, match arms, early return, try/catch
- RC pattern dimension: retain-release in same block (no motion needed), retain in pred + release in succ (motion across diamond), retain outside loop + release inside (should NOT move), nested function calls
-
Correctness verification:
ORI_CHECK_LEAKS=1on all test programs → zero leaksORI_TRACE_RC=1comparison before/after → same logical RC balance- Dual-exec parity: interpreter and LLVM produce identical results
- Debug and release builds produce identical results
-
Measurement:
ORI_LOG=ori_arc=info ori build tests/spec/→ reportrc_pairs_eliminated,rc_ops_moved- Compare total RC operations before/after (using Section 01 statistics)
- Target: 10-25% additional RC reduction on programs with cross-block RC patterns
-
Stress testing:
- Programs with 100+ basic blocks
- Deeply nested control flow (10+ levels)
- Programs with complex loop nesting
- Verify no exponential blowup in analysis time
-
Verify tests pass in debug and release
Semantic pin: Test that a diamond CFG with RcInc(x) in the entry block and RcDec(x) in both successors (same variable, no intervening use in the merge block) results in rc_pairs_eliminated > 0. This pattern is only optimizable with cross-block motion — single-block passes cannot see it.
Negative pin: Test that a loop containing RcInc(x) in the body does NOT get its inc moved outside the loop (path count mismatch would change execution count). Assert rc_ops_moved == 0 for a loop-body-only RC pattern. A second negative pin: test that a CFG with a use between the inc and dec that requires positive refcount does NOT have the pair eliminated — the use would fault if the pair were removed.
-
TPR checkpoint —
/tpr-reviewcovering 05.6 integration and full Section 05 implementation -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (05.6) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-05.6 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 05.6: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
05.R Third Party Review Findings
- None.
05.N Completion Checklist
-
RcSequencestate machine with correct transitions -
BlockMotionStatewith path counting and overflow detection - Bottom-up pass correctly identifies release sequences
- Top-down pass correctly identifies retain sequences
-
pair_up_and_place()correctly matches and eliminates/moves pairs - CFG hazard detection prevents motion through loop back-edges
- Path count overflow → conservative (no motion)
- Section 02 barrier information used for selective interference queries
- Section 03 KnownSafe flags consumed correctly
-
rc_pairs_eliminatedandrc_ops_movedtracked inSynergyMetrics - No exponential blowup on large CFGs (bail out at
MaxPtrStates) -
ORI_CHECK_LEAKS=1reports zero leaks on all test programs -
ORI_TRACE_RC=1shows balanced RC counts before/after - Dual-exec parity maintained (interpreter == LLVM)
-
./test-all.shgreen (debug + release) - No spurious warnings in normal compilation
- Plan annotation cleanup:
bash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan 05returns 0 annotations - All intermediate TPR checkpoint findings resolved
-
/tpr-reviewpassed — independent Codex review found no critical or major issues -
/impl-hygiene-reviewpassed — hygiene review clean -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: ORI_LOG=ori_arc=info ori build shows rc_pairs_eliminated > 0 on programs with cross-block RC patterns (e.g., retain in one branch, release in another of a diamond). CFG hazard detection correctly prevents motion through loops. All existing tests pass. ORI_CHECK_LEAKS=1 reports zero leaks. Analysis completes in O(n × vars × blocks) time without exponential blowup.