Section 10: Thread-Local Non-Atomic ARC
Context: Currently, ori_rt uses AtomicI64 with Relaxed/Release/Acquire ordering for all RC operations. This is correct for thread-shared values but wasteful for thread-local ones. Most values in most programs never cross thread boundaries — they’re created, used, and freed within a single thread.
Rust solved this by having two types: Rc (non-atomic, thread-local) and Arc (atomic, thread-safe). Ori doesn’t expose this distinction to the programmer — the compiler decides automatically.
Reference implementations:
- Rust
library/alloc/src/rc.rsvslibrary/alloc/src/sync.rs:RcusesCell<usize>(non-atomic),ArcusesAtomicUsize. Programmer chooses. - Swift: All RC is atomic by default, but
isKnownUniquelyReferenced()enables COW without RC overhead. No automatic non-atomic promotion. - CPython: GIL-protected — all RC is effectively non-atomic because only one thread runs at a time.
Depends on: §08 (escape analysis to determine thread-locality), §09 (header compression — non-atomic and atomic headers differ).
10.1 Thread Escape Analysis
File(s): compiler/ori_repr/src/escape/thread.rs
Extend escape analysis (§08) to track thread boundaries.
-
Define thread escape:
#[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum ThreadLocality { /// Value never crosses a thread boundary ThreadLocal, /// Value may be shared across threads ThreadShared, /// Unknown (conservative: treat as ThreadShared) Unknown, } -
Identify thread boundary operations:
spawn()— values captured by the spawned closure cross threadschan.send(value)— value crosses thread via channel- Global mutable state (if Ori adds it) — shared by all threads
- FFI calls with unknown thread behavior → conservative (ThreadShared)
-
Propagate thread-locality:
pub fn analyze_thread_locality( func: &ArcFunction, escape_info: &EscapeInfo, pool: &Pool, ) -> FxHashMap<AllocId, ThreadLocality> { let mut locality = FxHashMap::default(); for alloc in func.allocations() { if escape_info.escape_state(alloc) == EscapeState::NoEscape { // Non-escaping → definitely thread-local locality.insert(alloc, ThreadLocality::ThreadLocal); continue; } // Check if any escape path crosses a thread boundary let crosses_thread = escape_info.escape_paths(alloc) .any(|path| path.crosses_thread_boundary()); locality.insert(alloc, if crosses_thread { ThreadLocality::ThreadShared } else { ThreadLocality::ThreadLocal }); } locality } -
Whole-program optimization:
- If the program has NO
spawn()calls and NO channel operations → ALL values are ThreadLocal - This is detectable with a simple call graph scan
- Enables ALL RC operations to be non-atomic for single-threaded programs
- If the program has NO
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (10.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-10.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 10.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
10.2 Non-Atomic RC Runtime
File(s): compiler/ori_rt/src/rc/nonatomic.rs (new file inside rc/ module)
Module placement: Must live inside rc/ (e.g., rc/nonatomic.rs with mod nonatomic; in rc/mod.rs) to access call_drop_fn and rc_underflow_abort which are pub(super). Note: ori_rt allows unsafe (it is NOT in the #![deny(unsafe_code)] list). Every unsafe block MUST have a // SAFETY: comment.
Risk warning: Non-atomic RC on a value that IS actually shared across threads causes data races (undefined behavior). The soundness of this entire section depends on §08’s escape analysis and §10.1’s thread escape analysis being correct. If the analysis is unsound, this creates UB that Valgrind (memcheck) will not catch — only helgrind/TSAN will. This section must be gated on §08 being fully verified first.
-
Add a debug-only RC mode guard before exposing any non-atomic runtime entry point:
- Store an
RcModeflag (AtomicvsNonAtomic) in debug builds only, either in a side table or a debug-only header word ori_rc_inc/ori_rc_decassert they are not touching an allocation marked non-atomicori_rc_inc_nonatomic/ori_rc_dec_nonatomicassert they are not touching an allocation marked atomic- Release builds pay zero cost for this guard
- This guard is mandatory; helgrind is a secondary verifier, not the only safety net
- Store an
-
Implement non-atomic RC operations:
#[no_mangle] pub unsafe extern "C" fn ori_rc_inc_nonatomic(data_ptr: *mut u8) { if data_ptr.is_null() { return; } // SAFETY: data_ptr was returned by ori_rc_alloc; strong_count is at data_ptr - 8. let rc_ptr = data_ptr.sub(8).cast::<i64>(); let count = *rc_ptr; // plain load (no atomic) if count >= MAX_REFCOUNT { std::process::abort(); } *rc_ptr = count + 1; // plain store (no atomic) } #[no_mangle] pub unsafe extern "C" fn ori_rc_dec_nonatomic( data_ptr: *mut u8, drop_fn: Option<extern "C" fn(*mut u8)>, ) { if data_ptr.is_null() { return; } // SAFETY: data_ptr was returned by ori_rc_alloc; strong_count is at data_ptr - 8. let rc_ptr = data_ptr.sub(8).cast::<i64>(); let count = *rc_ptr; // plain load (no atomic) // Underflow protection — matches ori_rc_dec (rc/mod.rs). // Always-on, not debug-only. Catches double-free bugs. if count <= 0 { rc_underflow_abort(data_ptr); } *rc_ptr = count - 1; // plain store (no atomic) if count == 1 { // Last reference — drop via abort-on-panic guard. // ori_rc_dec_nonatomic is nounwind; unwinding through it is UB. if let Some(f) = drop_fn { call_drop_fn(f, data_ptr); } } } -
Also provide width-specific non-atomic variants:
ori_rc_inc_nonatomic_i8,ori_rc_dec_nonatomic_i8ori_rc_inc_nonatomic_i16,ori_rc_dec_nonatomic_i16- Combines with §09 header compression
-
LLVM codegen selects atomic vs. non-atomic based on
ReprPlan::rc_strategy():match repr_plan.rc_strategy(type_idx) { RcStrategy::None => { /* skip RC */ } RcStrategy::Atomic { width } => { // Call ori_rc_inc_$width / ori_rc_dec_$width (§09 width-suffixed) // ABI: inc(data_ptr), dec(data_ptr, drop_fn) — matches existing contract emit_atomic_rc(width); } RcStrategy::NonAtomic { width } => { // Call ori_rc_inc_nonatomic_$width / ori_rc_dec_nonatomic_$width // Same 2-arg dec ABI: dec(data_ptr, drop_fn) emit_nonatomic_rc(width); } } -
RL-19/20/21 dispatch populates the §06.3
RcAtomicitycarrier (relocated IN fromplans/aims-burden-tracking/section-06-phase7-mechanical-lowering.md §06.3perrouting.md §4; see §06 HISTORY 2026-06-02 a10). §06.3 delivered the carrier SITE:RcAtomicity { Atomic, NonAtomic }on theRcInc/RcDecArcInstrvariants (compiler/ori_arc/src/ir/instr.rs+ir/repr.rs), defaulted toAtomicat every construction site (reproducing the shipped unconditionally-atomic runtime). §10.2 owns the ACTIVATION: the §10.1ThreadLocalityanalysis + this section’s non-atomic runtime feed the realization dispatch that POPULATESRcAtomicity::NonAtomic(peraims-rules.md §8 RL-19/RL-20/RL-21, thread-locality derived fromLocality+ call-graph perRL-18aSSOT — NO parallel per-variable escape enum). The codegen selection above (emit_atomic_rc/emit_nonatomic_rc) reads the carrier (OR theReprPlan::rc_strategy()projection — §10’s design choice, reconciled at implementation). Cross-thread/Sendable →Atomic(RL-20); non-Sendable or provably-thread-local →NonAtomic(RL-19); whole-program no-spawn/no-channel/no-FFI-export → allNonAtomic(RL-21).- RC-coalescing atomicity-mismatch flush (TPR-06-1-codex, Major — surfaced reviewing §06.3 carrier; owner is §10.2 because the gap only bites once NonAtomic is POPULATED here):
compiler/ori_arc/src/aims/emit_rc/coalesce/mod.rsPendingRccarriesincs/decs/strategybut NO atomicity field, and its flush hardcodesRcAtomicity::default_atomic(). Today this is correct + emission-neutral (all ops Atomic). Once this checkbox POPULATESNonAtomic, the coalesce window will silently flatten aNonAtomicop — or a mixedAtomic+NonAtomicmerge — toAtomic(a latent miscompile: it currently flushes only onstrategymismatch, never on atomicity mismatch). The NonAtomic-population work MUST extendPendingRcto carry atomicity AND add an atomicity-mismatch flush mirroring the existingstrategy-mismatch flush, so aNonAtomicop never coalesces into anAtomicwindow (and vice-versa). Verify via a coalesce-window FileCheck pin asserting a mixed-atomicity adjacent pair is NOT merged.
- RC-coalescing atomicity-mismatch flush (TPR-06-1-codex, Major — surfaced reviewing §06.3 carrier; owner is §10.2 because the gap only bites once NonAtomic is POPULATED here):
-
Tests (relocated IN from §06.3 per
routing.md §4— the NonAtomic-SELECTION tests require this section’s selecting backend to exercise; §06.3 keeps only the carrier-default SITE pins): pertests.md §Matrix Testing Rulesemantic + negative pairing —- Cross-thread-escape produces
RcAtomicity::NonAtomic-vs-Atomicselection on lowered ops; FileCheck pins the variant + eval+LLVM parity. EXERCISE PATH: the channel-sendProducer<T>/Consumer<T>source path is blocked by E2040 at the frontend (tracked as BUG-02-037), so the primary cross-thread atomic-RC exercise is an FFI pointer export (extern "c"boundary handing an Ori-managed pointer out per RL-21 — a thread-escape path that compiles today). Add the channel-send variant as a#skip("BUG-02-037: channel-send frontend compile-block (E2040) — re-enable on BUG-02-037 resolution")companion case so the channel atomic-RC shape is tracked, not silently dropped. - Intra-thread Sendable produces
RcAtomicity::NonAtomic; negative pin: reverting the RL-19 dispatch fails the test (no false-atomic). - Program with no spawn/channel + no FFI exporting Ori-managed pointers (per RL-21) → all ops
NonAtomic; semantic pin verifies the whole-program assertion (overlaps the §10.4 test matrix row “Single-threaded program: nospawn(), nochannel()”).
- Cross-thread-escape produces
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (10.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-10.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 10.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
10.3 Migration Fence
File(s): compiler/ori_repr/src/arc_opt/migration.rs
If a value transitions from thread-local to thread-shared (e.g., sent on a channel), the non-atomic refcount must be migrated to atomic.
-
Design decision: static vs. dynamic migration
(a) Static (recommended): The compiler proves at compile time that a value is either always thread-local or always thread-shared. No runtime migration needed. If uncertain → atomic.
(b) Dynamic: Store a flag in the header indicating atomic/non-atomic. When a value crosses a thread boundary, flip the flag and issue a memory fence. Adds 1 bit of overhead + branching on every RC operation.
Recommendation: Option (a) for initial implementation. It’s simpler, has zero runtime overhead, and covers the vast majority of cases. Option (b) is only needed if analysis misses important cases (measure first).
-
If using static migration:
- At channel send: if value is marked non-atomic → compile error or automatic promotion to atomic at compile time
- At spawn: closure captures analyzed → all captured values promoted to atomic if needed
- The promotion happens at compile time, not runtime
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (10.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-10.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 10.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
10.4 Completion Checklist
Test matrix for §10 (write failing tests FIRST, verify they fail, then implement):
| Program pattern | Expected RC variant | Semantic pin |
|---|---|---|
| Single-threaded program with many RC operations | All ori_rc_inc_nonatomic / ori_rc_dec_nonatomic calls | Yes — zero ori_rc_inc (atomic) in LLVM IR |
Single-threaded program: no spawn(), no channel() calls | ALL values use non-atomic RC (whole-program optimization) | Yes — zero atomic RC ops in IR |
Multi-threaded: spawn() captures a list | Captured list uses atomic RC; local list in spawned fn uses non-atomic | Yes — split atomic/non-atomic |
chan.send(value) — value crosses channel | value promoted to atomic RC before send | Yes — ori_rc_inc (atomic) before send |
Value created AFTER spawn() in spawned closure | Non-atomic (thread-local to that thread) | Yes — post-spawn local stays non-atomic |
| Width-specific non-atomic: bounded value + thread-local | ori_rc_inc_nonatomic_i8 / ori_rc_dec_nonatomic_i8 | Yes — narrow + non-atomic combined |
| Non-atomic RC correct behavior: single-thread dec to 0 | Value dropped correctly (same semantics as atomic) | Yes — correctness equivalence |
- Write failing test matrix BEFORE implementation (verify tests fail with current all-atomic codegen)
- Single-threaded programs: ALL RC operations use
ori_rc_*_nonatomicvariants - Multi-threaded programs: only thread-shared values use atomic RC
- Channel sends correctly mark values as thread-shared
- Spawn captures correctly mark captured values as thread-shared
- Width-specific non-atomic variants:
ori_rc_inc_nonatomic_i8,ori_rc_dec_nonatomic_i8,ori_rc_inc_nonatomic_i16,ori_rc_dec_nonatomic_i16(combines with §09) - Add semantic pin test: a single-threaded program produces ZERO atomic RC operations in LLVM IR (all ops are
ori_rc_*_nonatomic). This test can ONLY pass with thread-local analysis enabled. - Debug builds assert on RC-mode mismatches (atomic API used on non-atomic allocation or vice versa)
- Non-atomic RC operations are measurably faster (benchmark ≥ 20% improvement in RC-heavy workloads)
-
./diagnostics/dual-exec-verify.shpasses — non-atomic RC produces identical behavior to atomic RC - Extend
diagnostics/valgrind-aot.shto accept an optional--tool=helgrindpassthrough flag:- Add
--helgrindflag to the script: when present, pass--tool=helgrind --fair-sched=yesto valgrind instead of--tool=memcheck - This is a concrete shell script change, not “invoke manually”
- File:
diagnostics/valgrind-aot.sh(modify the valgrind invocation line)
- Add
- Run helgrind on AOT binaries compiled from multi-threaded Ori programs (channel + spawn patterns):
./diagnostics/valgrind-aot.sh --helgrind tests/valgrind/threads/ - Create
tests/valgrind/threads/directory with at minimum:thread_local_only.ori— single-threaded program with many RC operations → no helgrind raceschannel_send.ori— program that sends values through a channel → helgrind must find no races
-
./test-all.shgreen -
./clippy-all.shgreen -
./diagnostics/dual-exec-verify.shpasses -
/tpr-reviewpassed — independent Codex review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER/tpr-reviewis clean. -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: A single-threaded benchmark program shows 0 atomic RC operations in LLVM IR (all ori_rc_*_nonatomic). Performance benchmark shows ≥20% improvement in RC-heavy workloads vs. atomic-only baseline.
10.R Third Party Review Findings
-
[TPR-10-001][major]section-10-thread-local-arc.md:117-148— Non-atomic RC has no debug-mode safety net; analysis wrong → silent UB. Theori_rc_inc_nonatomic/ori_rc_dec_nonatomicfunctions use plain loads/stores (lines 120, 124:*rc_ptr). If thread escape analysis (§08+§10.1) is unsound for any value, concurrent access produces data race UB. The plan acknowledges this risk (line 111) but proposes no runtime fallback — only helgrind testing as a detection tool. No mechanism exists to verify at runtime that a value classified as thread-local is actually single-threaded. Action: Add a debug-mode#[cfg(debug_assertions)]per-allocation flag that records the RC mode (atomic vs non-atomic). Assert on mismatched access (e.g.,ori_rc_inccalled on an allocation marked non-atomic). Zero cost in release builds. This catches analysis bugs during development before they become silent data races in production. Consensus: 3/3 reviewers.
HISTORY
- 2026-06-02 — RL-19/20/21 NonAtomic-selecting dispatch + its NonAtomic-selection tests relocated IN from
plans/aims-burden-tracking/section-06-phase7-mechanical-lowering.md §06.3(routing.md §4): §06.3 delivered theRcAtomicity { Atomic, NonAtomic }carrier SITE on theRcInc/RcDecArcInstrvariants (compiler/ori_arc/src/ir/instr.rs+ir/repr.rs), defaulted toAtomicat every construction site (reproducing the shipped unconditionally-atomic runtime RC primitives bit-for-bit). The dispatch that POPULATESRcAtomicity::NonAtomiccannot live in §06.3 — it requires §10.1’sThreadLocalitythread-escape analysis (compiler/ori_repr/src/escape/thread.rs), §10.2’s non-atomic runtime (ori_rc_inc_nonatomic/ori_rc_dec_nonatomic), and §10.2’s atomic-vs-non-atomic codegen selection (emit_atomic_rc/emit_nonatomic_rc), all owned HERE. PopulatingNonAtomicin §06.3 without §10’s backend would be an inert no-op (the carrier ignored by the unconditionally-atomic codegen). Perrouting.md §4MOVE-the-item discipline: §10.2 gains the carrier-population checkbox + the 3 NonAtomic-selection tests (cross-thread-escape FFI-export → Atomic; intra-thread Sendable → NonAtomic; whole-program RL-21 → all NonAtomic). The channel-send#skipcompanion is anchored to BUG-02-037 (channel constructionchannel<T>(buffer:)emits E2040 at the frontend — filed 2026-06-02; theProducer<T>/Consumer<T>channel-send thread-escape path is blocked until E2040 resolves, so the primary cross-thread exercise is FFI pointer export per RL-21). §06.3 retains only the carrier-default SITE pins; §06.3 checkboxes 1+2 + SITE-tests are[x]-DONE. The §10.4 test matrix already covers the whole-program + channel-send + spawn-capture rows; the relocated §10.2 tests reference them.