Section 01: RC Header Extension
Status: Complete
Goal: Extend the RC allocation header to store elem_dec_fn and elem_count, ensuring element cleanup happens regardless of which ori_buffer_rc_dec call reaches zero.
Context: The original RC header was 16 bytes (V3): [data_size: i64 | strong_count: i64]. This section extended it through V4 (24 bytes, adding elem_dec_fn) to V5 (32 bytes, adding elem_count). The final V5 layout is: [data_size: i64 | elem_dec_fn: ptr | elem_count: i64 | strong_count: i64 | data] = 32 bytes. The key invariant is that strong_count remains at data_ptr - 8 — all existing RC operations (ori_rc_inc, ori_rc_dec, ori_rc_count, ori_rc_is_unique, ori_buffer_rc_dec) rely on this.
Reference implementations:
- Swift
HeapObject.h: Stores type metadata (including destructor) in a 2-word header alongside refcount. - Lean 4
lean_object: 8-byte header with tag + RC, element count in adjacent memory.
01.1 Header Layout Change
File(s): compiler/ori_rt/src/rc/mod.rs
Change the header constant and document the new layout.
V5 offset reference (all offsets in bytes, final layout):
| Field | From base | From data_ptr |
|---|---|---|
data_size | base + 0 | data - 32 |
elem_dec_fn | base + 8 | data - 24 |
elem_count | base + 16 | data - 16 |
strong_count | base + 24 | data - 8 |
data | base + 32 | data + 0 |
Note: items below reference V4 (24-byte) values from the initial implementation. These were superseded by V5 (32-byte) in 01.4 (TPR-01-001). All items are complete.
- Change
RC_HEADER_SIZEfrom16to24 - Update the doc comment on
RC_HEADER_SIZE(line 97-98, currently “V3 layout: … The header is 16 bytes: 8 fordata_size+ 8 forstrong_count.”) to “V4 layout:[data_size: i64 | elem_dec_fn: *const () | strong_count: i64 | data ...]The header is 24 bytes: 8 fordata_size+ 8 forelem_dec_fn+ 8 forstrong_count.” - Update
ori_rc_incdoc comment (line 101-108): referencesstrong_countatdata_ptr - 8— still correct in V4, but add a note thatdata_ptr - 16is nowelem_dec_fn(notdata_size) - Add a constant
ELEM_DEC_FN_OFFSET= 16 (byte offset from data pointer toelem_dec_fnfield, i.e.,data_ptr.sub(ELEM_DEC_FN_OFFSET)reacheselem_dec_fnat base + 8) - Add helper functions:
fn store_elem_dec_fn(data: *mut u8, f: Option<extern "C" fn(*mut u8)>)— writeselem_dec_fnatdata - 16(data is at base + 24, elem_dec_fn is at base + 8, so offset = 16)fn load_elem_dec_fn(data: *mut u8) -> Option<extern "C" fn(*mut u8)>— reads fromdata - 16
- Verify:
strong_countremains atdata - 8(base + 16 in V4) — all existing RC operations (ori_rc_inc,ori_rc_dec,ori_rc_count,ori_rc_is_unique) must continue working without changes to their pointer arithmetic - Verify:
IsSharedLLVM IR emission inarc_emitteruses GEP i8 with-8offset to reach refcount — this is correct in V4 since strong_count stays at data - 8. Test atarc_emitter/tests.rsline ~689 asserts this. No code change needed, but verify the test passes. - Add unit tests for helper functions: round-trip store/load, NULL initial value, verify strong_count at data - 8 is unaffected (covered by test_rc_header_is_24_bytes in runtime_tests.rs)
Cleanup
- [STYLE]
compiler/ori_rt/src/rc/mod.rs:43— Remove decorative banner// ── Reference Counting (V3: 16-byte header, data-pointer style) ──────────. Replace with plain section comment// Reference Countingper hygiene rules (no decorative characters). - [STYLE]
compiler/ori_rt/src/rc/mod.rs:93— Remove decorative banner// ── Core RC Functions ────────────────────────────────────────────────. Replace with plain// Core RC Functions. - [STYLE]
compiler/ori_rt/src/rc/debug.rs:20,136,221,340— Remove 4 decorative banners (// ── RC Event Tracing,// ── Leak Attribution,// ── Runtime Assertion Mode,// ── Leak Detection). Replace with plain section comments. - [STYLE]
compiler/ori_rt/src/map/hash_table.rs:39,93,113,180,211,296— Remove 6 decorative banners (// ── Layout Computation,// ── Metadata Operations,// ── Probing,// ── Capacity Management,// ── Rehashing,// ── Counting). Replace with plain section comments.
01.2 Allocation Functions
File(s): compiler/ori_rt/src/rc/allocate.rs
Update all allocation functions to account for the larger header.
-
ori_rc_alloc: Initializeelem_dec_fnfield to NULL (zero) atbase + 8. Movestrong_countinit frombase + 8(line 46:base.add(8)) tobase + 16(change tobase.add(16)). Add a newbase.add(8)line to zero-initializeelem_dec_fn.data_sizewrite atbase + 0stays the same. Data pointer returned is nowbase + 24(viabase.add(RC_HEADER_SIZE)— automatic sinceRC_HEADER_SIZEchanges to 24). -
ori_rc_free: Already usesdata_ptr.sub(RC_HEADER_SIZE)andsize + RC_HEADER_SIZE— no pointer arithmetic changes needed, just verify the updated constant propagates correctly. -
ori_rc_realloc: Already usesdata_ptr.sub(RC_HEADER_SIZE)andRC_HEADER_SIZEfor total sizes — no pointer arithmetic changes needed. Verify the realloc preserves all 24 header bytes (data_size + elem_dec_fn + strong_count). Added SAFETY comment for elem_dec_fn preservation. -
ori_rc_data_size: Usesdata_ptr.sub(RC_HEADER_SIZE).cast::<i64>()— no code change needed sinceRC_HEADER_SIZEis updated from 16 to 24 in step 01.1. Verify the constant propagates correctly. - Update all
// SAFETY:comments that reference “16 bytes” to say “24 bytes” - Update
compiler/ori_llvm/src/aot/debug/builder_scope.rs: Thecreate_rc_heap_typefunction (line ~327) has hardcoded DWARF debug info offsets for the V3 layout. Must addelem_dec_fnfield atoffset_bits: 64, movestrong_counttooffset_bits: 128, movedatatooffset_bits: 192, and updatetotal_sizefrom128 + inner_size_bitsto192 + inner_size_bits. - Update
compiler/ori_llvm/src/tests/runtime_tests.rs: Thetest_rc_header_is_16_bytestest renamed totest_rc_header_is_24_bytesand updated for V4:data_ptr - 8= strong_count (unchanged),data_ptr - 16= elem_dec_fn (new, verify NULL),data_ptr - 24= data_size (moved). Includesori_rc_data_size()verification and elem_dec_fn NULL verification + realloc preservation. - Update
compiler/ori_rt/src/tests.rs: Testsrc_inc_skips_at_max_refcountandrc_dec_skips_at_max_refcountuseptr.sub(8)for strong_count — these are CORRECT and unchanged in V4, verified they pass. - Update
compiler/ori_rt/src/rc/allocate.rsdoc comments and SAFETY comments: all references to “16 bytes” or “16-byte header” updated to 24. - Update
compiler/ori_rt/src/list/reset/mod.rs:is_uniquereplaced withori_rc_is_uniquecall (data race fix). - Update
compiler/ori_rt/src/rc/debug.rsline 244:data_ptr.sub(8).cast::<i64>().read()— reads strong_count, remains correct. Verified. - Update
docs/compiler/design/11-runtime/data-structures.md: references updated for V4 header size, layout diagrams, offset table, and mermaid diagram. Newelem_dec_fnfield documented. - Update
docs/compiler/design/11-runtime/reference-counting.md: layout diagrams, struct definitions, offset references, and size references updated for V4. - Update
docs/compiler/design/11-runtime/index.md: references updated for V4 header. Fixed backwards offset description. - Update
compiler/ori_rt/src/map/hash_table.rslines 10, 16: doc comments updated to “24 bytes”. - Update
docs/compiler/design/11-runtime/string-sso.md: references updated to “24-byte RC header”. - Update
docs/compiler/design/11-runtime/collections-cow.md: references updated to 24. - Add unit test: allocate, store elem_dec_fn, verify it’s preserved after realloc (covered by test_rc_header_is_24_bytes in runtime_tests.rs)
- Verify
ori_rc_reallocpreserveselem_dec_fn: verified and documented with SAFETY comment.
Forward dependency for Section 02: The header change in Section 01 only creates the infrastructure (store_elem_dec_fn/load_elem_dec_fn) and updates existing ori_buffer_rc_dec to read from the header. Section 02 must ensure elem_dec_fn is STORED in the header for every buffer that contains fat-pointer elements. This includes not just codegen-level list construction, but also ALL runtime functions that create new list buffers via ori_rc_alloc:
| File | Functions | Notes |
|---|---|---|
list/cow.rs | ori_list_push_cow, ori_list_pop_cow, ori_list_set_cow | Slow path creates new buffer |
list/cow_structural.rs | ori_list_insert_cow, ori_list_remove_cow | Creates new buffer on slow path |
list/cow_sort.rs | ori_list_concat_cow, ori_list_reverse_cow, ori_list_sort_cow, ori_list_sort_stable_cow | Multiple ori_rc_alloc calls |
list/query.rs | ori_list_reverse, ori_list_concat | Creates new buffers via ori_rc_alloc |
list/slice.rs | ori_list_materialize_slice | Materializes slice into owned buffer |
list/mod.rs | ori_list_alloc_data, ori_list_ensure_capacity, ori_list_new, ori_list_push_new, write_array_to_list | Core allocation paths |
list/reset/mod.rs | ori_list_reset_buffer | Creates new buffer when reuse fails |
iterator/consumers.rs | ori_iter_collect, ori_iter_collect_set | Creates output buffer for collect (list and set) |
set/mod.rs | alloc_set_hash_buffer, ori_set_to_list | Hash table buffer + list buffer creation |
map/mod.rs, map/hash_table.rs | ori_map_keys_to_list, ori_map_values_to_list, rehash_set, map alloc | Map hash table buffers + list buffer creation |
string/ops.rs | ori_str_split | Creates [str] list buffer via direct ori_rc_alloc |
lib.rs | ori_args_from_argv | Creates [str] list buffer for @main(args: [str]) |
Each of these must propagate elem_dec_fn to the new buffer. The two approaches are: (a) the runtime function reads elem_dec_fn from the OLD buffer’s header and writes it to the NEW buffer’s header (recommended for COW functions that have access to the old buffer), or (b) the codegen always passes real elem_dec_fn to ori_buffer_rc_dec for the new buffer’s type, and the first non-NULL write populates the header (works for collect and alloc_data where there is no “old buffer”). Section 02 must enumerate and handle each case.
Cleanup
- [WASTE]
compiler/ori_rt/src/rc/mod.rs:43-68— V3 layout comment condensed to compact V4 layout diagram. - [STYLE]
compiler/ori_rt/src/rc/allocate.rs:1-4— Module doc updated to V4 and reference the new layout. - [WASTE]
compiler/ori_rt/src/rc/allocate.rs:168-174—rc_trace_realloc()helper added todebug.rsand called fromori_rc_realloc. - [WASTE]
compiler/ori_rt/src/list/reset/mod.rs:69-76—is_unique()replaced withori_rc_is_unique()call (data race fix). - [WASTE]
compiler/ori_rt/src/rc/map_rc.rs:92—_lenparameter removed frommap_buffer_cleanup.
01.3 RC Dec Functions
File(s): compiler/ori_rt/src/rc/mod.rs, compiler/ori_rt/src/rc/list_rc.rs
Update ori_buffer_rc_dec and related functions to use the stored elem_dec_fn.
-
ori_buffer_rc_dec(data, len, cap, elem_size, elem_dec_fn): store_elem_dec_fn_once before dec, drop_elements_and_free reads from header via load_elem_dec_fn. - Both
#[cfg]branches must be updated: Both branches updated for list_rc, set_rc. Store elem_dec_fn in header before dec, read from header in drop path. - Thread safety of write-once pattern: store_elem_dec_fn_once uses AtomicPtr CAS on non-single-threaded path. Plain pointer write on single-threaded path.
-
ori_buffer_drop_unique(data, len, cap, elem_size, elem_dec_fn): Stores to header, reads from header for cleanup. -
slice_buffer_rc_dec: Stores/reads on ORIGINAL buffer’s header (not slice data). -
ori_set_buffer_rc_decincompiler/ori_rt/src/rc/set_rc.rs: Store-then-read pattern via set_buffer_cleanup. Both#[cfg]branches andori_set_buffer_drop_uniqueupdated. -
ori_map_buffer_rc_decincompiler/ori_rt/src/rc/map_rc.rs: Decision: option (c) — codegen-based, not header-based. Maps keep using parameters (two cleanup functions cannot fit in single header slot). Recommended approach documented. - Recommended approach for maps (option c): Documented in plan. Maps use codegen-based approach, not header-based.
- Add unit tests:
-
semantic_pin_header_based_element_cleanup: RC 2→1 with real fn, then 1→0 with NULL — header fn used -
elem_dec_fn_null_initialized: alloc → immediate load → NULL -
elem_dec_fn_store_and_load_roundtrip: store non-NULL → load matches -
elem_dec_fn_store_once_first_non_null_wins: first wins, second is no-op -
elem_dec_fn_store_once_null_is_noop: store_once(None) doesn’t change NULL -
elem_dec_fn_preserved_through_realloc: store → realloc → load matches -
drop_unique_reads_from_header: pre-stored fn, call with NULL param → header fn used
-
01.4 Slice-Aware Functions
File(s): compiler/ori_rt/src/rc/list_rc.rs, compiler/ori_rt/src/slice_encoding/mod.rs
Ensure seamless slices correctly interact with the new header.
-
ori_list_rc_inc: Verified no code change needed — strong_count remains atdata_ptr - 8in V4. - Verify
is_slice_cap(cap)still works — SLICE_FLAG is in cap, not in the header. Verified. - Verify
slice_original_data(data, cap)still correctly computes the original buffer’s data pointer. Doc comments updated to referenceRC_HEADER_SIZEinstead of hardcoded “16”. - Test: create a list, create a slice, call
ori_buffer_rc_decon the slice — verify the original buffer’s elem_dec_fn is used for element cleanup (slice_uses_original_buffers_elem_dec_fn) - [TPR-01-001] Fix
slice_buffer_rc_decto clean up ALL initialized elements when slice is last owner — V5 header withelem_countfield. (2026-03-20)- Add
ELEM_COUNT_OFFSETconstant (byte offset from data_ptr toelem_countfield) - Add
store_elem_count/load_elem_counthelpers - Expand
RC_HEADER_SIZEfrom 24 to 32 bytes, update header layout - Update
ori_rc_allocto zero-initializeelem_countand adjust offsets - Update
ori_rc_free,ori_rc_realloc,ori_rc_data_sizefor 32-byte header - Update
ori_buffer_rc_decto store caller’slenaselem_countin header (non-slice path) - Update
slice_buffer_rc_decto readelem_countfrom header and iterate fromoriginal_dataforelem_countelements - Update DWARF debug info in
create_rc_heap_typefor V5 layout - Update all doc comments referencing “24 bytes” to “32 bytes”
- Add semantic pin test: slice outliving original, verify all elements cleaned up (4 tests: middle, end, empty, full-range)
- Valgrind/AOT test:
tests/valgrind/slice_str_outlives_original.ori— 5 patterns (middle, end, beginning, empty, single-element slice), all Valgrind clean +ORI_CHECK_LEAKS=1clean (2026-03-20)
- Add
01.R Third Party Review Findings
-
[TPR-01-001][high]compiler/ori_rt/src/rc/list_rc.rs:157—slice_buffer_rc_decstill leaks child RCs for elements outside the visible slice when the slice is the last owner. Resolved: Validated on 2026-03-20. Accepted — real issue.drop_elements_and_freeusesslice_len(not original buffer’s element count) so elements outside the slice range leak their RC children. Implementation task added to 01.4: extend header withelem_countfield for full-range cleanup. -
[TPR-01-002][medium]plans/rc-header-elem-dec/section-01-rc-header.md:203— Section 01 marks verification complete even though the recorded valgrind result still has five failing cases waived as “pre-existing” and “unrelated.” Resolved: Validated on 2026-03-20. Accepted — all 5 failures confirmed still present (invalid reads/frees, not just leaks). Completion item un-checked and reworded. 5 COW/TRMC valgrind failures added as open items in 01.N. -
[TPR-01-003][high]compiler/ori_arc/src/aims/intraprocedural/block.rs:269— The cross-blockProjectliveness repair still missesLetaliases of projected values. Resolved: Validated and fixed on 2026-03-21. Createdcompute_project_alias_sources()to precompute function-wide map from (Project destination + transitive Let aliases) → Project source.propagate_project_source_demand()now uses this map to propagate demand for any alias of a Project destination, not just direct destinations. 5 unit tests + 1 Valgrind regression test (tests/valgrind/project_let_alias_cross_block.ori) with 5 patterns (field-in-then, field-in-else, transitive-alias, branch-both, alias-in-loop). -
[TPR-01-004][high]compiler/ori_arc/src/aims/intraprocedural/block.rs:285— The newProjectalias closure still stops atLetaliases and does not propagate through jump arguments into successor block params. Resolved: Validated and fixed on 2026-03-21. Extendedcompute_project_alias_sources()Step 2 fixed-point loop to propagate throughJump { args }→ target block params (phi-edge renaming), matching the pattern incollect_param_borrowed_vars(). 4 unit tests (direct jump, transitive chain, let-then-jump, loop header) + 1 semantic pin test (project_block_param_cross_block_propagates_source_demand) + 1 Valgrind regression test (tests/valgrind/project_block_param_cross_block.ori) with 5 patterns (merge, nested-merge, loop-header, both-fields-merge, field-with-computation). All 1006 ori_arc tests pass, 13492 total tests pass, Valgrind clean.
01.N Completion Checklist
-
RC_HEADER_SIZE == 32incompiler/ori_rt/src/rc/mod.rs(V5, was 24 in V4) -
ELEM_DEC_FN_OFFSET == 24constant (byte offset from data_ptr to elem_dec_fn, was 16 in V4) -
ELEM_COUNT_OFFSET == 16constant added (V5: byte offset from data_ptr to elem_count) - All allocation functions (
ori_rc_alloc,ori_rc_free,ori_rc_realloc) use 32-byte header -
ori_rc_alloczero-initializeselem_dec_fnatbase + 8,elem_countatbase + 16, and writesstrong_countatbase + 24 -
ori_buffer_rc_decstores non-NULL elem_dec_fn in header and reads from header at cleanup time -
ori_buffer_drop_uniquereadselem_dec_fnfrom header (not parameter) -
slice_buffer_rc_decstores/reads on ORIGINAL buffer’s header (not slice data) -
ori_set_buffer_rc_decandori_set_buffer_drop_uniqueuse same store-then-read pattern - Map strategy decided and implemented (codegen-based — option c)
-
store_elem_dec_fn/load_elem_dec_fnuse atomic CAS for thread safety (non-single-threaded path) - All dual
#[cfg]branches updated (non-single-threaded + single-threaded) in list_rc.rs, set_rc.rs - DWARF debug info in
create_rc_heap_typeupdated for V5 layout -
test_rc_header_is_32_bytestest updated for V5 layout (wastest_rc_header_is_24_bytesin V4) - All hardcoded “24” references in doc comments updated to 32 (allocate.rs, slice_encoding, hash_table.rs, mod.rs, docs/ including string-sso.md and collections-cow.md)
-
ori_rc_reallocSAFETY comment updated;elem_dec_fnpreservation through realloc documented - All
data_ptr.sub(8)for strong_count confirmed still correct (spot-check list_rc.rs, map_rc.rs, set_rc.rs, mod.rs, debug.rs, reset/mod.rs, tests.rs) -
IsSharedGEP -8 offset inarc_emitter/instr_dispatch.rs:329verified correct in V4 - Semantic pin test: buffer with stored
elem_dec_fn, dec with NULL parameter, cleanup happens via header -
reset/mod.rsis_unique()replaced withori_rc_is_unique()call (data race fix) -
map_buffer_cleanup_lenparameter removed -
ori_rc_realloctrace usesrc_trace_realloc()helper (not raweprintln!) - All decorative banners removed from
mod.rs,debug.rs,hash_table.rs - All RC runtime tests pass (
timeout 150 cargo test -p ori_rt) — 343 passed - No regressions in AOT tests (
timeout 150 cargo test -p ori_llvm --test aot) — 1729 passed - Valgrind clean on existing heap-allocating tests — 73/75 pass via
valgrind-aot.shon 2026-03-21. 2 pre-existing slice COW failures (cow_list_slice.ori,slice_operations.ori) are not related to RC header changes — tracked as pre-existing AOT slice bugs.-
tests/valgrind/cow/cow_leak_scenarios.ori— fixed 2026-03-21:key_inc/val_incinori_map_insert_cowafter copy.ORI_CHECK_LEAKS=1clean. -
tests/valgrind/cow/cow_map_insert_remove.ori— fixed 2026-03-21: same fix.ORI_CHECK_LEAKS=1clean. -
tests/valgrind/cow/cow_nested.ori— fixed 2026-03-21: same fix.ORI_CHECK_LEAKS=1clean. -
tests/valgrind/cow/cow_sharing.ori— 0 errors, 0 leaks (verified 2026-03-20) -
tests/valgrind/trmc/trmc_tree_ops.ori— 0 errors, 0 leaks (fixed 2026-03-20: AIMS backward analysis now propagates Project source demand through cross-block lifetimes)
-