Section 09: Tail Call Optimization
Status: Complete Goal: Functions whose last action is a self-recursive call are compiled as loops via ARC-level loop lowering so they execute in constant stack space.
WARNING — VERY HIGH COMPLEXITY / VERY HIGH RISK: This is the most complex section in the plan. It requires coordinated changes across TWO crates (
ori_arcfor detection/rewriting,ori_llvmfor emission), touches the ARC pipeline’s RC insertion pass (the most correctness-critical pass), and requires proving that RcDec hoisting is safe per-variable. The ARC+TCO interaction is research-grade — Lean 4’s solution (ownParamsUsingArgs) required multiple iterations. Budget 2-3x the estimated effort.Decision (2026-03-05): ATTEMPT. The ARC+TCO interaction will be attempted, not deferred. Use the rollback plan (§09.2) if correctness cannot be verified.
Spec position (verified): The spec guarantees TCO. Annex E (System Considerations, §Recursion) states: “Tail calls are guaranteed to be optimized. A tail call does not consume stack space.” Clause 15 (§15.3, Tail Call Optimization) guarantees
recursepattern’sself(...)in tail position compiles to a loop with O(1) stack space. This makes TCO a correctness requirement, not merely a quality improvement.
Crate dependency order:
ori_arcdoes NOT depend onori_llvm. Changes flow downstream only: (1) tail call detection/rewriting inori_arcFIRST, (2) emission changes inori_llvmSECOND. These must be in separate commits or a single coordinated commit. Never modifyori_llvmto work around a missingori_arcfeature.
Context: J3’s gcd function is tail-recursive in the source (else gcd(a: b, b: a % b) is the last expression), but the generated code does not apply TCO. Despite fastcc being applied (which enables LLVM’s TCO machinery), the IR structure prevents the optimization. The actual ARC IR for gcd is:
bb2:
%10: int = Apply @gcd(%6, %9) // tail call
Jump bb3(%10) // jump to merge block
bb3: (%11: int)
Return %11 // return from merge
The call is in bb2, but Return is in the merge block bb3. LLVM requires ret immediately after tail call in the same basic block — the br to the merge block and the phi node break this. For gcd specifically (all-scalar parameters), there are no RcDec operations; the issue is purely the CFG merge structure. For functions with RC-managed parameters (strings, lists, etc.), the ARC pipeline also inserts RcDec between the call and the return, which would independently block TCO.
For deeply recursive inputs, this causes stack growth and eventual overflow. The Ori spec guarantees TCO (Annex E), making this a conformance issue, not just a quality improvement.
Journey affected: J3.
Reference implementations:
- Lean4
src/Lean/Compiler/IR/RC.lean: Detects tail calls and either converts to loops or emitsmusttail. - Koka
src/Core/CheckFBIP.hs: FBIP (functional-but-in-place) optimization includes tail call loop lowering. - LLVM
musttail: Guarantees tail call optimization — callee reuses caller’s stack frame.
09.1 Tail Call Detection
File(s): compiler/ori_arc/src/rc_insert/ (RC insertion — inserts RcDec between call and return), compiler/ori_arc/src/lib.rs (run_arc_pipeline — pipeline ordering)
New module
compiler/ori_arc/src/tail_call/hygiene requirements:
mod.rswith//!module docs describing pass purpose, inputs, outputs, and pipeline placementpub(crate)visibility on pass entry point (only called fromlib.rspipeline)#[cfg(test)] mod tests;declaration at bottom ofmod.rs- Sibling
tests.rswith ARC IR construction tests (usingtest_helpers.rs)///doc comments on allpub/pub(crate)functions#[tracing::instrument]on pass entry point- If module exceeds ~200 lines, split detection and rewrite into
detect.rsandrewrite.rssubmodules- Add
mod tail_call;declaration inlib.rs(alongside existingmod block_merge;)- File size budget: detection (~100-150 lines) + rewrite (~150-250 lines) + mod.rs dispatch (~50-80 lines) = ~300-480 lines max. If approaching 500, split immediately.
Phase boundary: Tail call detection MUST be in
ori_arc, notori_llvm. The detection operates on ARC IR (inspectingArcInstr::Applyinstructions,ArcTerminator::Return/ArcTerminator::Jumpterminators, and interveningArcInstr::RcDecoperations). Theori_llvmcrate only consumes the annotation.
TDD requirement: Write ARC IR unit tests in
ori_arcthat construct tail-recursive and non-tail-recursive ARC functions, run the detection pass, and assert correct annotation. Write LLVM AOT tests that verify the current non-optimized behavior (recursivecallpresent in IR) BEFORE implementing. Then implement and verify thecallis replaced.
Detect when a function’s return value is exactly the result of a self-recursive call with no intervening operations (no drops, no cleanup, no transformations).
Two obstacles to TCO (both must be addressed):
-
CFG merge block (all functions): ARC lowering places the tail call in one branch of an if/match, with a
Jumpto a shared merge block that doesReturn. The actual pattern isApply(self, args)in block N →Jump(merge, [result])→Return(result)in merge block. Even with zero RC operations, LLVM cannot TCO because the call and ret are in different basic blocks. Loop lowering at the ARC level (replacingApply+Jumpwith a directJumpback to the entry) handles this naturally. -
ARC cleanup (RC-managed functions only): The RC insertion pass (
rc_insert/) runs AFTER lowering and insertsRcDecoperations for variables whose last use is before the return. For tail-recursive functions with RC-managed parameters (strings, lists, etc.):- Parameters not passed to the recursive call get
RcDecbefore the jump - Local RC variables get
RcDecbefore the jump - These
RcDecoperations appear BETWEEN theApplyand theJumpterminator - This independently blocks TCO even if the merge block issue were resolved
- Parameters not passed to the recursive call get
Two options for handling RC cleanup (Recommendation: Option 2):
- Pre-RC tail call detection: Detect tail calls in the ARC IR BEFORE
rc_insertruns, then mark them.rc_insertcan then hoist drops before the call instead of between call and jump. - Post-RC tail call rewrite (recommended): After RC insertion, detect the pattern
Apply(self, args) → RcDec* → Jump(merge, [result])and rewrite toRcDec* → Apply(self, args) → Jump(merge, [result])(hoist drops before the call). This is simpler but must verify safety: dropping a value before the recursive call only works if the dropped value is not used by the call. This is the approach used by the pipeline placement in §09.2 (afterrc_identity+rc_elim, beforeblock_merge).
Criteria for tail call eligibility (at ARC IR level):
- The call is to the same function (self-recursion) — the
ArcInstr::Apply { func, .. }wherefunc == arc_func.name - The call result flows to the function’s return value with no transformation — typically
Apply(result)→Jump(merge, [result])→Return(result)across 2-3 blocks - No ARC operations (
RcDec/RcInc) between the call and the return path, OR all interveningRcDecoperations can be safely hoisted before the call - All
RcDectargets between call and return are NOT arguments to the recursive call (safe to drop before call)
Note: fastcc calling convention is an LLVM-level concern handled by select_call_conv() in ori_llvm/src/codegen/abi/mod.rs. All non-extern, non-main Ori functions already get fastcc. This is relevant for the musttail fallback but not for ARC-level loop lowering.
Existing infrastructure (must build on, not duplicate):
ori_arc/src/borrow/mod.rs—check_tail_call()(line ~425) already detects tail calls during borrow inference. It findsApplyinstructions whosedstmatches theReturnvalue and promotes borrowed params to owned to preserve tail call eligibility. This is the existing tail call detection — new detection should reuse or extend this logic, not duplicate it.ori_llvm/src/codegen/ir_builder/calls.rs—call_tail()(line ~50) already exists and setsset_tail_call(true)on LLVM call instructions. This is the LLVM-leveltailattribute (weaker thanmusttail). Currently unused by the ARC emitter.recursepattern sentinel — Therecurse(...)pattern lowers toApply @__recurse(...)in ARC IR (ori_arc/src/lower/constructs.rs, line ~171), using a sentinel name. Direct self-recursion (e.g.,gcd(a: b, b: a % b)) lowers toApply @gcd(...)with the actual function name. TCO detection must handle both patterns.- Borrow inference tail call preservation —
ori_arc/src/borrow/mod.rsmodule doc (§Tail Call Preservation) describes the ownership promotion strategy. When a tail call passes a borrowed param to an owned callee position, the param is promoted to owned — this avoids needingRcDecafter the tail call (Lean 4ownParamsUsingArgspattern).
CRITICAL: Existing check_tail_call limitation. The current check_tail_call() in borrow/mod.rs only examines blocks whose terminator is ArcTerminator::Return. It looks for an Apply whose dst matches the Return value in THE SAME BLOCK. But the actual ARC IR pattern for tail calls is cross-block: Apply in block N → Jump(merge, [result]) → Return(result) in merge block. The Apply and Return are in DIFFERENT blocks connected by a Jump. Therefore:
check_tail_call()as-is will NOT detect thegcdpattern (or any if/else tail call).- The new detection pass must trace through
Jumpterminators: start from aReturnblock, find the single predecessor’sJump, and check if the jump arg originated from anApplyin that predecessor. - Alternatively, detect during ARC lowering before the merge block is created (if pattern matching on the CanExpr is feasible).
- The borrow inference
check_tail_call()should still be kept for ownership promotion — it runs at a different phase and serves a different purpose.
__recurse sentinel resolution — MUST be addressed before detection:
The __recurse sentinel is created in ori_arc/src/lower/constructs.rs:171 during recurse() pattern lowering. It is NEVER resolved elsewhere — the LLVM emitter’s emit_apply will fail to find it in ctx.functions and log an “unresolved function” error. This is a latent bug for any program using recurse() compiled via AOT.
For TCO detection, __recurse must be resolved to the current function name. Two options:
- Resolve at lowering time — The
ArcLowererhas access to the function name viaself.buildercontext. Store the function name and emitApply @func_name(...)instead ofApply @__recurse(...). This is the cleanest fix. - Resolve at detection time — The TCO detection pass compares
Apply.funcagainst botharc_func.nameAND the interned__recursesentinel.
Recommendation: Option 1 — resolve at lowering time. This fixes the latent AOT bug AND simplifies TCO detection (only compare against arc_func.name). The sentinel was introduced because “ARC IR doesn’t track the current function name in the lowerer.” This is currently true: ArcIrBuilder has no func_name field, and ArcLowerer has no func_name field either. However, lower_function_can() receives name: Name (line 110) and constructs the ArcLowerer at line 153. Implementation: add a func_name: Name field to ArcLowerer, set it from the name parameter in lower_function_can(). Then lower_exp_recurse() in constructs.rs can use self.func_name instead of interning "__recurse". Verify that self.func_name matches the name passed to builder.finish(name, ...) (same Name value, trivially true).
If Option 1 is not feasible (e.g., unexpected issue with adding the field), use Option 2 and add a resolve_recurse_sentinel(func: &mut ArcFunction, interner: &StringInterner) helper that rewrites all Apply { func: __recurse, .. } to Apply { func: func.name, .. } as the first step of the detection pass.
- PREREQUISITE: Resolve
__recursesentinel to actual function name. Option 1 (preferred): (a) Addpub(crate) func_name: Namefield toArcLowererinlower/expr/mod.rs. (b) Set it fromnameparameter inlower_function_can()atlower/mod.rs:153. (c) Inlower/constructs.rs:lower_exp_recurse(), replaceself.interner.intern("__recurse")withself.func_name. If Option 2, addresolve_recurse_sentinel()helper as first step of detection pass. (2026-03-06: Also fixedself(...)calls — parser emitsIdent("self")notSelfRef, resolved tofunc_nameinlower_identandlower_call. Also fixedlower_exp_recurseto emit conditional control flow instead of eager evaluation.) - PREREQUISITE: Verify that
recurse()pattern programs compile and run correctly via AOT before and after the sentinel resolution fix (this may expose a latent AOT bug — see note above). (2026-03-06: Confirmed — before fix, AOT panicked with “unresolved functionself”. After fix, factorial(5)=120 and gcd(48,18)=6 both work correctly in AOT. All 4164 spec tests pass.) - Add tail call detection pass as a NEW function (not extending
check_tail_call()— different purpose, different phase). Detection must trace cross-block patterns:Return(v)→ predecessorJump(merge, [v])→Apply { dst: v, func: self_name }in predecessor block. (2026-03-06:tail_call::detect_tail_calls()inori_arc/src/tail_call/mod.rs) - Confirm pipeline placement in
run_arc_pipeline(): AFTERrc_identity+rc_elim(all RC ops finalized) and BEFOREblock_merge(as specified in §09.2’s pipeline code sample). Document placement with comment inrun_arc_pipeline(). Updatelib.rspipeline ordering documentation. (2026-03-06: placed at line ~205 with ordering comment, pipeline doc updated) - Handle: for RC-managed functions, the ARC pipeline inserts
RcDecbetween the tail callApplyand theJumpterminator — these must be hoisted before the call for TCO to work (2026-03-06: detection verifies all post-Apply instructions are RcDec; actual hoisting deferred to §09.2 rewrite) - Safety check: all hoisted
RcDectargets are NOT among the arguments to the recursiveApply(would be use-after-free). Formally: for eachRcDec { var: v }between Apply and Jump, assertvis not inApply.args. (2026-03-06:find_tail_apply_in_block()builds arg_set and rejects if any RcDec target overlaps) - Annotation format: Choose one of: (a) new field
tail_calls: Vec<(ArcBlockId, usize)>onArcFunction(block + instruction index), (b) newArcInstr::TailApplyvariant (replacesApplyfor eligible calls), or (c) bitflag onArcInstr::Apply. Recommendation: (a) — sidecar annotation, no ARC IR variant changes, minimal disruption. If (b) is chosen, ALL passes that match onArcInstr::Applymust also handleTailApply(grep forArcInstr::Applyacross all ofori_arcto enumerate). (2026-03-06: chose option (a) —tail_calls: Vec<TailCallSite>on ArcFunction withTailCallSite { call_block, call_instr_idx }) - Multi-clause functions: Ori functions with multiple clauses (e.g.,
@f (0: int) -> int = 1then@f (n) = n * f(n-1)) are lowered to a single ARC function with pattern-matching dispatch. The recursiveApply @f(...)uses the actual function name. Verify detection works for multi-clause functions (the Apply is inside a match arm block, which then Jumps to the merge). (2026-03-06: testsmulti_clause_tail_recursive_arm_detectedandmulti_clause_function_detectedverify both cases) - Mutual recursion exclusion: Only self-recursion is eligible:
Apply.func == arc_func.name. Mutual recursion (A calls B calls A) requires the callee’s stack frame to be compatible, which is not guaranteed. Verify thatApply.func != arc_func.namecorrectly excludes mutual recursion. Also excludeApplyIndirect(closure calls — callee identity unknown at compile time). (2026-03-06: detection checksfunc != func_nameand only matchesArcInstr::Apply, excluding ApplyIndirect) - Document in
tail_call/mod.rsmodule doc:ApplyIndirecttail calls (closure-based mutual recursion) are out of scope — callee identity unknown at compile time, cannot verify stack frame compatibility (2026-03-06: documented in module doc)
09.1 Completion Checklist
-
__recursesentinel resolved to actual function name (no unresolved__recursein ARC IR post-lowering) (2026-03-06) - Tail call detection correctly identifies self-recursive tail calls across the cross-block pattern (Apply → Jump → Return) (2026-03-06)
- Detection accounts for
RcDecoperations between Apply and Jump (verifies hoisting is safe; actual hoisting is in §09.2) (2026-03-06) - Non-tail calls (result transformed, intervening side effects) are correctly excluded (2026-03-06)
- Mutual recursion is correctly excluded (self-recursion only for this section) (2026-03-06)
-
ApplyIndirectcalls are excluded (callee identity unknown) (2026-03-06) - Multi-clause functions with self-recursive calls in match arms are detected (2026-03-06)
-
recurse()pattern calls (formerly__recursesentinel) are detected (2026-03-06: resolved at lowering time — emits Apply @func_name(…) directly) - Annotation format chosen and implemented (sidecar on
ArcFunctionor IR variant) (2026-03-06: option (a) —TailCallSitestruct withcall_block+call_instr_idx) -
run_arc_pipeline()updated with new pass placement and ordering comment (2026-03-06) -
//!module doc ontail_call/mod.rsdescribing pass purpose, inputs, outputs (2026-03-06) -
///doc comments on allpub/pub(crate)functions in the new module (2026-03-06) -
#[tracing::instrument(skip_all)]on pass entry point (2026-03-06) - All new source files (excluding tests) under 500 lines (2026-03-06: mod.rs ~170 lines)
-
tracing::debug!at pass entry/exit (function name, whether tail call found, how many rewritten) (2026-03-06) - ARC IR unit test: self-recursive tail call detected (scalar args) (2026-03-06)
- ARC IR unit test: self-recursive tail call detected (RC-managed args, RcDec hoisted) (2026-03-06)
- ARC IR unit test: non-tail call (result used in addition) NOT detected (2026-03-06)
- ARC IR unit test: mutual recursion NOT detected (Apply to different function name) (2026-03-06)
- ARC IR unit test:
RcDectarget used as arg to recursive call — NOT eligible (safety check) (2026-03-06) - All tests in sibling
tests.rsfiles (not inline#[cfg(test)]blocks with test bodies) (2026-03-06) -
./test-all.shgreen (2026-03-06: 12,125 tests passing) -
./clippy-all.shgreen (2026-03-06) -
./fmt-all.shgreen (2026-03-06)
09.2 Loop Lowering (Primary) or musttail Emission (Fallback)
File(s): compiler/ori_arc/src/ (ARC-level loop lowering — primary), compiler/ori_llvm/src/codegen/arc_emitter/terminators.rs (downstream: emission of the loop structure)
Phase boundary: Loop lowering converts the ARC tail-call pattern (
Apply(self, new_args)in branch block →Jump(merge, [result])→Return(result)in merge) into aJump(entry, new_args)back-edge — this is an ARC IR → ARC IR transformation. It MUST live inori_arc. Theterminators.rsfile inori_llvmonly needs to emit the resultingJumpterminator (which it already handles — seeArcTerminator::Jumpinterminators.rs). Do NOT add tail-call-specific logic toori_llvm.
Pipeline ordering: If implemented as a new pass, it must be added to
run_arc_pipeline()inori_arc/src/lib.rswith explicit documentation of where it runs relative torc_insert. Per ARC pipeline rules: do NOT add passes without updatingrun_arc_pipeline(). Do NOT call out of order.
Recommended approach: Loop lowering (primary). Two independent obstacles prevent musttail: (1) the ARC IR lowers tail calls through a merge block (Apply → Jump(merge) → Return), putting the call and ret in different basic blocks; (2) for RC-managed functions, RcDec operations appear between the call and the jump. Loop lowering at the ARC IR level (replacing the tail Apply + Jump with a back-edge Jump to the entry) avoids both issues entirely.
musttail (fallback): Would require both: (a) the Apply and Return in the same basic block (no merge block), AND (b) zero ARC cleanup between them. The current ARC IR structure never satisfies (a), making musttail impractical without a separate ARC IR restructuring pass. Loop lowering is strictly superior.
; CURRENT ARC IR (before TCO):
; bb0:
; %4: bool = %1 == 0
; Branch %4 ? bb1 : bb2
; bb1: ; base case
; Jump bb3(%0)
; bb2: ; recursive case
; %9: int = %0 % %1
; %10: int = Apply @gcd(%1, %9)
; Jump bb3(%10)
; bb3: (%11: int)
; Return %11
; AFTER LOOP LOWERING (ARC IR rewrite):
; bb0: ; entry — jumps to loop header
; Jump bb1(%0, %1)
; bb1: (%a: int, %b: int) ; loop header (was merge point)
; %done = %b == 0
; Branch %done ? bb2 : bb3
; bb2: ; exit
; Return %a
; bb3: ; loop body
; %rem = %a % %b
; Jump bb1(%b, %rem) ; back-edge (was recursive Apply)
; RESULTING LLVM IR:
bb0:
br label %loop
loop:
%a = phi i64 [%a_init, %bb0], [%b, %loop.body]
%b = phi i64 [%b_init, %bb0], [%rem, %loop.body]
%done = icmp eq i64 %b, 0
br i1 %done, label %exit, label %loop.body
loop.body:
%rem = srem i64 %a, %b
br label %loop
exit:
ret i64 %a
; FALLBACK (musttail, only when no ARC cleanup AND same-block ret):
%result = musttail call fastcc i64 @_ori_gcd(i64 %b, i64 %rem)
ret i64 %result
Requirements for musttail (fallback only):
- Caller and callee must have the same number and type of parameters
- Caller and callee must use the same calling convention
- The
retmust immediately follow the call (no intervening instructions — no RcDec!) - Return ABI must match exactly (including sret/aggregate conventions)
- Varargs and ABI-mismatched signatures are ineligible
Loop Lowering Algorithm
COMPLEXITY WARNING: The loop lowering rewrite modifies the function’s block structure (creates new blocks, removes old ones, rewrites terminators and block params). This is comparable in complexity to
block_merge/select.rs(300 lines) orblock_merge/merge.rs(173 lines). The ArcIrBuilder is designed for forward construction, not in-place mutation — the rewrite operates directly onArcFunction.blocks(Vec mutation). Studyblock_merge/compact.rs(block renumbering after removal) andblock_merge/merge.rs(terminator rewriting) for established patterns. Do NOT invent new mutation patterns — reuse existing infrastructure.
The rewrite operates on the ARC IR after tail call detection (09.1) has annotated eligible calls. For each annotated tail call:
- Identify the pattern: Find the
Apply { dst: result, func: self_name, args: new_args }in block B, followed byJump { target: merge, args: [result] }as B’s terminator, where merge block’s terminator isReturn { value: merge_param }. - Create loop header: Create a new block (or repurpose the merge block) with block params matching the function’s parameters. The entry block gets a
Jumpto this header with the original parameter values. - Rewrite base case: The base-case branch (which previously jumped to merge with the base result) now jumps to an exit block that does
Return. - Rewrite recursive case: Replace the
Apply+Jump(merge, [result])withRcDec*(hoisted from between Apply and Jump) +Jump(header, new_args). The new args are the recursive call’s arguments, which become the next iteration’s “parameters.” - Handle multiple tail call sites: If the function has multiple tail-recursive paths (e.g.,
matcharms that each recursively call), each path gets its own back-edge to the header. - Remove dead merge block: The original merge block is no longer needed (the exit block replaces it). Remove it during a compact step.
Pipeline Placement (MUST be explicit)
The loop lowering pass MUST be placed in run_arc_pipeline() in ori_arc/src/lib.rs. Recommended placement:
...existing passes...
rc_identity::propagate_rc_identity(func, &identity_map, pool); // existing — normalize RC targets
rc_elim::eliminate_rc_ops_dataflow(func, &ownership); // existing — remove dead RC ops
tail_call::rewrite_tail_calls(func, interner); // NEW — after RC ops are final
block_merge::merge_blocks(func); // existing — cleans up dead blocks from TCO rewrite
...remaining passes...
Why after RC elimination: The RcDec operations are in their final positions — identity propagation has normalized targets and elimination has removed dead pairs. We can safely determine which drops to hoist and which are already eliminated. Running before RC elimination would require the pass to reason about RC operations that may later be removed.
Why before block_merge: The loop lowering creates new blocks (header, exit) and removes the merge block. Block merge (Phase 1: compact) will clean up any dead blocks. Phase 4 (merge jump chains) may further simplify the entry → header chain. This is the same pattern as other ARC passes — create the structure, let block_merge clean it up.
Update run_arc_pipeline() documentation: Add a comment explaining the new pass’s purpose and ordering rationale. Update the module doc in ori_arc/src/lib.rs (pipeline listing at line ~46) to include the tail call pass.
NOTE:
lib.rsfunction body exception. Per hygiene rules,lib.rsshould be an index file (no function bodies).ori_arc/src/lib.rsis a known exception — it containsrun_arc_pipeline(),run_arc_pipeline_all(), andrun_uniqueness_analysis()(382 lines currently). Adding a single function call to the tail call pass within the existingrun_arc_pipeline()body is acceptable. Do NOT add any new function definitions tolib.rs— the pass implementation must live intail_call/mod.rs. Iflib.rsapproaches 500 lines, extract the pipeline functions into apipeline.rssubmodule.
Interaction with block_merge Pass
The loop-lowered function creates a back-edge from the recursive-case block to the header. This creates a multi-predecessor block (the header), which is:
- NOT merged by Phase 4 (merge only targets single-predecessor blocks). Correct — we want the header to remain.
- Subject to Phase 1 (compact) — the dead merge block will be removed. Correct.
- Subject to Phase 5 (single-pred phi) — the exit block may be single-predecessor; if its params are unnecessary, they’ll be cleaned up. Correct.
- Subject to Phase 6 (dead param) — if any header block params are unused (e.g., a parameter that is never read in the loop body), they’ll be removed. Correct.
The entry block’s Jump(header, original_params) is a single-predecessor edge to the header (the header also has the back-edge from the body). Phase 4 will NOT merge entry into header because header has 2 predecessors. This is correct.
Conclusion: No special handling needed. Block merge naturally handles the TCO-generated loop structure.
- Create
compiler/ori_arc/src/tail_call/module directory withmod.rs(dispatch +//!module docs), andtests.rs(sibling test file with#[cfg(test)] mod tests;inmod.rs). Split intorewrite.rssubmodule. (2026-03-06: mod.rs ~197 lines, rewrite.rs ~106 lines) - Pass entry point:
pub(crate) fn rewrite_tail_calls(func: &mut ArcFunction)—pub(crate)(only called fromlib.rs), with#[tracing::instrument(skip_all)]and///doc comment. Note:internerparam omitted (unused — follows “no dead code” rule). (2026-03-06) - Implement the 6-step rewrite algorithm described above (2026-03-06: trampoline + header block params + back-edge rewrite)
- Add
mod tail_call;tolib.rsmodule declarations (alongsidemod block_merge;) (2026-03-06: added in §09.1) - Add single call
tail_call::rewrite_tail_calls(func);torun_arc_pipeline()AFTERrc_elimand BEFOREblock_merge— with ordering comment (2026-03-06: line ~212) - Update
ori_arc/src/lib.rsmodule doc (pipeline listing at line ~46) to include the tail call pass (2026-03-06) - Detect tail position: call result flows through Jump args to Return (cross-block pattern) (2026-03-06: handled in §09.1 detection)
- Handle ARC cleanup: RcDec operations between Apply and Jump remain in place — they clean up the current iteration’s values before the back-edge Jump starts the next iteration. (2026-03-06: verified by
rewrite_preserves_rc_decsunit test) - Handle multiple tail call sites in a single function (multiple match arms with self-calls) (2026-03-06:
rewrite_handles_multiple_tail_call_sitesunit test) - DEFERRED:
musttailfallback not needed — loop lowering handles all cases including zero-ARC functions.musttailwould only save onebr labelvs the trampoline jump, which LLVM optimizes away anyway. - Write stress test with linear-depth tail recursion (e.g., countdown to 0 at depth >= 100_000) (2026-03-06:
test_tail_rec_countdown_deepat 200,000 depth) - Verify stress test runs without stack overflow in AOT (2026-03-06: passes in both debug and release)
- Verify: non-tail-recursive functions are unaffected (no structural changes to non-tail-recursive functions) (2026-03-06:
rewrite_does_not_modify_non_tail_functionunit test + all existing recursion tests pass) - ARC-safe loop lowering is defensible — verified by unit tests, AOT tests, leak checks, and RC trace (2026-03-06)
Rollback Plan
If tail call optimization introduces correctness regressions (leaks, double-frees, wrong results):
- Revert the tail call detection/rewrite pass — specific files to revert:
compiler/ori_arc/src/tail_call/(ortail_call.rs) — the new detection/rewrite module (delete entirely)compiler/ori_arc/src/lib.rs— remove the pass invocation fromrun_arc_pipeline()and pipeline doccompiler/ori_arc/src/lower/constructs.rs— revert__recursesentinel resolution if changed (restore__recursesentinel)compiler/ori_arc/src/ir/mod.rs— remove any newArcInstrvariant orArcFunctionfield- Keep all NEW tests — convert them to
#[ignore = "TCO reverted: ARC cleanup hoisting safety unverified"]
- Mark L-10 as deferred in §10.8 with rationale: “ARC cleanup hoisting safety could not be verified”
- Do NOT leave a partially-working TCO pass — it must be all-or-nothing. If detection works but rewrite is unsound, revert BOTH.
- If the
__recurseresolution fix (prerequisite) is independently correct and does not regress, it may be kept even if the TCO pass is reverted — it fixes a latent AOT bug.
09.2 Completion Checklist
- Self-recursive tail calls compiled as loops (no
callto self in IR) — verified bytest_tail_recursive_gcd_has_no_self_callIR quality test (2026-03-06) - J3
gcdfunction emits a loop, not a recursive call — IR containsphi i64+br labelback-edge, nocall fastcc i64 @_ori_gcd(2026-03-06) - Stress test: tail recursion at depth >= 100,000 runs without stack overflow in AOT —
test_tail_rec_countdown_deepat 200,000 (2026-03-06) - ARC cleanup (RcDec) preserved correctly before loop back-edge — no leaks, no double-free —
rewrite_preserves_rc_decsunit test +ORI_CHECK_LEAKS=1(2026-03-06) -
ORI_CHECK_LEAKS=1on tail-recursive programs reports 0 leaks — verified on gcd, countdown, factorial, list-param (2026-03-06) -
ORI_TRACE_RC=1on tail-recursive programs shows balanced RC ops — gcd: 0 RC ops (all scalar); list-param: 1 alloc, 1 dec, 1 free (2026-03-06) - Non-tail-recursive functions unchanged (no false positives) —
rewrite_does_not_modify_non_tail_functionunit test + all existing recursion tests pass unchanged (2026-03-06) - Functions where tail call rewrite is UNSAFE (RcDec target used by recursive call) are correctly excluded —
rc_dec_target_used_as_arg_not_eligibleunit test (2026-03-06) -
run_arc_pipeline()inori_arc/src/lib.rsupdated with new pass and ordering comment (2026-03-06) -
ori_arc/src/lib.rsmodule doc updated to include tail call pass in pipeline listing (2026-03-06) -
./fmt-all.shgreen (2026-03-06) - All new source files under 500 lines (excluding tests) — mod.rs ~197 lines, rewrite.rs ~106 lines (2026-03-06)
- No
#[allow(clippy::...)]without justification —./clippy-all.shgreen (2026-03-06)
Test scenarios (all must be covered):
- AOT test: scalar args only (gcd with int params) —
test_tail_rec_gcd_correct+ IR quality test (2026-03-06) - AOT test: RC-managed args —
test_tail_rec_with_list_param(100K iterations, 1 alloc/1 free) +test_tail_rec_with_string_param(2026-03-06) - AOT test: mixed scalar and RC-managed args —
test_tail_rec_with_list_paramhas both list (RC) and int (scalar) params (2026-03-06) - AOT test: nested tail calls in match arms —
test_tail_rec_collatz_deephas 3 match arms (base, even, odd) (2026-03-06) - AOT test: tail call in if/else branches —
test_tail_rec_if_else_both_branches(2026-03-06) - AOT test:
recurse()pattern —test_tail_rec_recurse_pattern+test_tail_rec_recurse_deep(200K depth) (2026-03-06) - BLOCKED: multi-clause AOT test — pre-existing multi-clause AOT codegen bug (type resolution emits
build_structon non-struct LLVM type, causes SIGSEGV). Not a TCO regression — multi-clause functions segfault independently of TCO. ARC IR unit tests (multi_clause_tail_recursive_arm_detected,multi_clause_function_detected) verify detection works correctly. - AOT test: function with both tail and non-tail recursive calls —
test_tail_rec_mixed_tail_and_nontail(2026-03-06) - AOT test: deep tail recursion (>= 100,000 depth) —
test_tail_rec_countdown_deep(200,000) +test_tail_rec_recurse_deep(200,000) (2026-03-06) - AOT test: deep non-tail recursion (e.g., 100 depth) —
test_rec_depth_100+test_rec_depth_1000(pre-existing) (2026-03-06) - Negative test: non-tail call (result used in
result + 1) —test_rec_factorial(n * f(n-1)),test_rec_sum_to(n + f(n-1)) all pass unchanged (2026-03-06) - Negative test: mutual recursion —
test_rec_is_even_odd,test_rec_mutual_countdownall pass unchanged (2026-03-06) - Spec test in
tests/spec/: tail recursion —tests/spec/codegen/tail_call_test.ori(gcd, countdown, factorial, collatz at interpreter-safe depth 400). NOTE: depth > 10,000 not possible in interpreter (500 depth limit) — covered by AOT tests at 200K depth. (2026-03-06) - Spec test:
recurse()pattern —recurse()uses the same lowering as direct recursion after sentinel resolution. AOT tests at 200K depth cover this. (2026-03-06) - IR quality test:
test_tail_recursive_gcd_has_no_self_callinir_quality_codegen.rs— nocall fastcc i64 @_ori_gcd, hasbr label, hasphi i64(2026-03-06) -
./test-all.shgreen — 12,146 tests pass (2026-03-06) -
./clippy-all.shgreen (2026-03-06) - No regressions in
cargo test -p ori_llvm— 435 unit tests, 1242 AOT tests pass (2026-03-06)
Test Locations
All tests must follow sibling tests.rs conventions and be placed in the correct file for their scope:
| Test Type | File | Convention |
|---|---|---|
| ARC IR unit tests (detection, rewrite) | compiler/ori_arc/src/tail_call/tests.rs (new) | Sibling tests.rs |
| ARC IR unit tests (cross-block detection) | compiler/ori_arc/src/tail_call/tests.rs | Construct ARC functions, run pass, assert annotations |
| LLVM IR quality tests (loop in IR) | compiler/ori_llvm/tests/aot/ir_quality.rs | Integration test — compile Ori, check IR patterns |
| AOT execution tests (deep recursion) | compiler/ori_llvm/tests/aot/tail_call.rs (new) or compiler/ori_llvm/tests/aot/arc.rs | Compile and execute, assert no crash |
| Spec tests (functional behavior) | tests/spec/recursion/ or tests/spec/functions/ | .ori spec tests with deep recursion |
| Leak/RC balance tests | compiler/ori_llvm/tests/aot/arc.rs | ORI_CHECK_LEAKS=1 + ORI_TRACE_RC=1 |
Cross-Section Interactions
§04 (ARC Closure Lifecycle) + §09 (Tail Calls)
Coordination required (noted in 00-overview.md): Tail-call lowering must not regress closure/environment lifetime correctness. Specifically:
- If the tail-recursive function captures variables in a closure environment, the
RcDecfor the environment must happen at the right time. - Loop lowering replaces the recursive
Applywith aJumpback-edge. AnyRcDecfor closure environments that was between Apply and Jump must be preserved as part of the loop body (before the back-edge Jump). - §04’s closure environment RC fixes are complete — verify that the loop-lowered code still calls
RcDecon closure environments at the right time.
§08 (Loop IR Quality) + §09 (Tail Calls)
As noted in §08’s cross-section analysis: tail call optimization converts recursive calls to loops. The resulting loop has the same structure as a lower_loop loop (header block with params, body, back-edge). §08.2’s invariant-param elimination and §08.1’s CSE apply equally to TCO-generated loops.
Implementation order: §08 SHOULD be implemented before §09 so the loop optimization infrastructure is tested on simpler source-level loops before being applied to TCO-generated loops. TCO-generated loops can then benefit from §08’s optimizations automatically via block_merge.
§01 (Block Merging) + §09 (Tail Calls)
The block_merge pass runs AFTER the tail call rewrite in run_arc_pipeline(). It will:
- Remove the dead merge block (Phase 1: compact)
- Potentially merge the entry → header chain if entry is a single-predecessor pass-through (Phase 4: merge jump chains). However, the header has 2 predecessors (entry + back-edge), so Phase 4 will NOT merge entry into header.
- Clean up any dead block params on the exit block (Phase 6: dead params)
No special handling needed — block_merge operates correctly on TCO-generated loops.
Section 09 Exit Criteria
J3’s gcd function is stack-safe (via loop lowering). A recursion-depth stress test (>100,000) runs without stack overflow in AOT mode. Non-tail-recursive functions show no behavioral change. run_arc_pipeline() is updated with the new pass. __recurse sentinel is resolved. Both recurse() pattern and direct self-recursion are optimized. If deferred, §10.8 has concrete rationale and follow-up plan.