Section 01: Block Merging & CFG Simplification
Status: Complete
Goal: Every basic block in emitted LLVM IR has a reason to exist. Required entry/exit/unwind blocks are allowed, but avoidable bridge blocks are not. No trivial br label %next between sequential let-bindings. No 4-block diamonds for select-eligible if/else expressions.
Context: This is the most pervasive structural finding across the journey set. Redundant unconditional branches were confirmed in J1, J2, J5, J6, J7, J8, and J12, and similar trivial bridge patterns also appear in targeted functions from later journeys. The current ARC-to-LLVM lowering path often materializes extra blocks even when no control-flow divergence exists. This inflates IR size and makes IR-level debugging harder.
Additional scope: Match arm codegen also creates redundant single-instruction br blocks (documented by compiler/ori_llvm/tests/aot/ir_quality.rs). These should be treated the same as let-binding boundary blocks — sequential arms that produce a value and branch to merge should not require intermediate blocks.
Journeys affected: J1 (1 redundant branch), J2 (3), J5 (2), J6 (4), J7 (2+bridges), J8 (3), J10, J11, J12 (multiple each).
Architecture note: The LLVM arc_emitter creates one LLVM block per ARC IR block via block_map. The redundant blocks are primarily introduced in ARC lowering (compiler/ori_arc/src/lower/) before LLVM emission. Fixes can target:
- (a) The ARC lowerer — avoid creating blocks for sequential operations (preferred, fixes at source)
- (b) A post-lowering ARC block-merging pass — merge trivial
br-only blocks before LLVM emission - (c) The LLVM emitter — detect trivial ARC blocks and inline them into predecessors during emission
Reference implementations:
- Rust
rustc_codegen_llvm/mir/block.rs: Merges sequential MIR basic blocks during codegen — only creates new LLVM blocks when MIR has actual control flow. - Zig
src/Sema.zig: Emits minimal basic blocks; sequential operations stay in the same block.
01.1 Sequential Block Merging at Let-Binding Boundaries
File(s): compiler/ori_arc/src/lower/ (primary — ARC block creation), compiler/ori_llvm/src/codegen/arc_emitter/mod.rs (secondary — LLVM emission)
The ARC lowerer creates a new ARC basic block for each let-binding expression, even when no control flow divergence occurs. The LLVM emitter then faithfully creates one LLVM block per ARC block (1:1 via block_map). The fix should either prevent creating these blocks in the lowerer, merge them in a post-lowering pass, or skip them during LLVM emission.
- Implement using approach (b): post-lowering ARC block merge pass
- Trace the ARC lowerer to identify where blocks are created for sequential operations (let-bindings, assignments, drops) — look for
new_block()orpush_block()patterns inori_arc/src/lower/ - Choose approach: (a) avoid creation in lowerer, (b) post-lowering merge pass, or (c) emitter-level skip — chose (b): post-lowering ARC block merge pass (approach (c) was attempted and reverted — see below)
Approach (c) reverted: Emitter-level block_map aliasing is fundamentally incompatible with instructions that create internal LLVM basic blocks. RcInc/RcDec on fat pointers (strings, lists) emit inline SSO/null-check conditionals that create internal blocks (rc_inc.sso_skip, rc_dec.heap, etc.) and move the LLVM builder away from the original block. When the merged block’s instructions are emitted into the aliased LLVM block, they appear after a terminator mid-block, and the self-loop detection (current_block == target) fails because the builder is at an internal block, not the aliased entry. This caused 113 AOT test failures. The correct approach is (b): merge trivial blocks in the ARC IR before LLVM emission, so the emitter sees a single block with all instructions inline.
01.1 Completion Checklist
- No avoidable branch-only bridge blocks between sequential let-bindings in audited journey functions
- Match arm codegen produces no redundant single-instruction
brblocks for sequential arms - IR test: function with 3+ sequential
letbindings emits a single basic block (no intermediatebr label) - IR test: match with 3+ value-producing arms has no trivial bridge blocks between arm and merge
-
compiler/ori_llvm/tests/aot/ir_quality.rstests updated for block merging scope -
./test-all.shgreen -
./clippy-all.shgreen - No regressions in
cargo test -p ori_llvm
01.2 Select Lowering for Trivial If/Else
File(s): compiler/ori_arc/src/block_merge/mod.rs (Phase 3: select-fold)
Simple if/else expressions where both branches are trivial values (constants, variable reads — no side effects, no function calls) previously emitted a 4-block diamond pattern (condition → then/else → merge with phi). These are now folded into Select instructions by the ARC block merge pass (Phase 3).
Approach: Added Phase 3 (select-fold) to the block merge pass, between Phase 2 (downgrade trivial invokes) and Phase 4 (merge jump chains). A body is “trivial” when every instruction is Let { Literal } or Let { Var(v) } where v is not defined in the same body. The pass detects 4-block diamond patterns where both arm blocks are trivial and jump to the same merge block, then replaces the Branch with Select instructions and a Jump to the merge block. Dead arm blocks are cleaned up by a compaction sub-step (3b).
Example from J2 my_abs:
; my_abs is NOT select-eligible because negation lowers to
; Let { PrimOp { Unary(Neg) } } — a Let, but not in the trivial
; whitelist (only Literal and Var are whitelisted).
Cases like if x > 0 then a else b (where both branches are plain values) are eligible and now emit select.
- Define “trivial branch” criteria:
is_trivial_body()— onlyLet { Literal }orLet { Var(pre-branch) }, no PrimOps, no Apply/Invoke, no RC ops - Implement select fold in ARC block merge pass (Phase 3, between downgrade and merge)
- Emit
selectfor differing args,Let { Var }passthrough for identical args - Add test cases for select-eligible and select-ineligible if/else expressions
- Verify:
if x > 0 then a else bemitsselect,if x > 0 then f() else g()emits diamond
01.2 Completion Checklist
-
if x > 0 then a else b(both arms are variables/constants) emitsselect, not a 4-block diamond -
if x > 0 then f() else g()(side-effecting arms) still emits the branch+phi diamond -
if x > 0 then -x else x(negation/PrimOp) still emits diamond (not select) - IR test: select-eligible if/else produces
selectand nophi - IR test: select-ineligible if/else still produces conditional branch
-
compiler/ori_llvm/tests/aot/ir_quality.rstests updated for select lowering scope -
./test-all.shgreen -
./clippy-all.shgreen - No regressions in
cargo test -p ori_llvm
01.3 Single-Predecessor Phi Elimination
File(s): compiler/ori_arc/src/block_merge/single_pred_phi.rs (Phase 5 implementation), compiler/ori_arc/src/block_merge/mod.rs (pipeline wiring)
Phi nodes with only one incoming edge are equivalent to a direct value reference. These appeared when block merging created unnecessary merge points.
Approach: Added Phase 5 (eliminate_single_pred_params) to the block merge pipeline, running after Phase 4’s fixed-point. It handles two cases: (1) Jump predecessors — converts params to Let bindings via lower_parallel_copy and clears Jump args; (2) non-Jump predecessors (Branch/Switch/Invoke) — clears dead params directly since these terminators don’t carry args. Single-pass, no COW annotation remapping needed (B’s body stays in B). Uses compute_predecessors for direct predecessor lookup.
- Implement Phase 5 in
block_merge/single_pred_phi.rs - Promote
lower_parallel_copytopub(super)inmerge.rs - Wire Phase 5 into
merge_blocks()after Phase 4 - Update module docs from “Four-Phase” to “Five-Phase”
- Verify: J6
_ori_to_codepattern has no single-predecessor phis - Verify: J12
try_divpattern has no single-predecessor phis
01.3 Completion Checklist
- Phase 5 (
eliminate_single_pred_params) wired intomerge_blocks()after Phase 4 - 7 ARC unit tests in
block_merge/tests.rs(Jump params, non-Jump dead params, multi-pred negative, entry negative, COW preservation, span consistency, Branch-both-arms-same) - 4 IR quality tests in
ir_quality.rs(count_single_pred_phisutility + enum match, option propagation, single-entry merge, synthetic) - Zero single-predecessor phi nodes in emitted IR for all tested patterns
- J6
_ori_to_codehas no single-predecessor phi nodes - J12
try_divhas no single-predecessor phi nodes -
./test-all.shgreen (12,040 passed) -
./clippy-all.shgreen - No regressions in
cargo test -p ori_llvm
01.4 Break Bridge Block Elimination
File(s): compiler/ori_arc/src/lower/control_flow/loops.rs (primary — loop lowering creates exit blocks with mutable var params), compiler/ori_llvm/src/codegen/arc_emitter/terminators.rs (secondary — LLVM emission of break paths)
Loop break paths emit trivial bridge blocks that just forward control flow. In J7’s _ori_sum_loop, the break path goes bb3→bb2 through a bridge block containing dead phi values (%v26 constant 0, %v27 unused loop counter). The break should branch directly to the function exit.
Root cause: The ARC lowerer creates an exit_block with block params for mutable variables (line 93-94 in loops.rs). When break value is emitted, it jumps to the exit block passing both the break value AND the current mutable variable values. If the mutable variables are not used after the loop, these block params become dead phis in the LLVM IR.
Approach options:
- (a) ARC-level: Enhance the block merge pass to detect exit blocks whose params (other than the result) are unused after the block. Remove unused params and corresponding jump args. This generalizes beyond loops.
- (b) Loop lowering: Only add mutable vars to the exit block params if they are used after the loop. Requires forward analysis of variable usage.
- (c) LLVM emission: Detect and skip dead phi values during emission. Least desirable — should fix at source.
Note: Phase 5 (single-predecessor phi elimination) may already handle some of these cases if the exit block has a single predecessor. The remaining issue is when the exit block has multiple predecessors (e.g., both break and loop-end paths jump to exit).
TDD requirement: Write an IR-quality test capturing the current break bridge block pattern (the bb3->bb2 pattern with dead
%v26/%v27phis) BEFORE implementing. Also write an ARC IR unit test inblock_merge/tests.rsfor the break-exit-block pattern.
- Identify break bridge blocks in loop codegen — check if Phase 5 already handles the single-predecessor case
- For multi-predecessor exit blocks: implement dead-param elimination (remove block params whose values are never read after the exit block) in the block merge pass (approach a) — Phase 6 in
block_merge/dead_param.rs - Route break directly to the post-loop continuation block where possible — structural bridge blocks remain (ARC Branch can’t carry args) but LLVM trivially eliminates these
- Ensure dead phi values from bridge blocks are not emitted
- Verify: J7
_ori_sum_loopbreak path has no intermediate bridge block
01.4 Completion Checklist
- J7
_ori_sum_loopbreak path branches directly to post-loop block (no intermediate bridge) — single-break case handled by Phase 5 - No dead phi values (
%v26,%v27pattern) emitted in break bridge blocks — Phase 6 eliminates unused exit block params - Loop break paths in all audited journey functions have no trivial bridge blocks — structural bridge blocks remain but contain no dead phis
- IR test:
loop { if cond then break value }has no bridge block between break and post-loop —test_single_break_loop_clean_exit - IR test: multi-break loop has no dead phis in exit block —
test_multi_break_loop_no_dead_phis - IR test: multi-break loop preserves live params when used after loop —
test_multi_break_loop_preserves_live_params - ARC unit tests:
dead_param_multi_pred_exit_block+dead_param_all_live_preservedinblock_merge/tests.rs -
compiler/ori_llvm/tests/aot/ir_quality.rstests updated for break bridge scope -
./test-all.shgreen (12,045 passed) -
./clippy-all.shgreen - No regressions in
cargo test -p ori_llvm
Section 01 Exit Criteria
All four subsections complete. Re-running code journeys 1–12 shows zero “redundant block” or “trivial branch” findings. Entry/exit/unwind blocks remain only where semantically required.