§02 String Indexing Codegen
Goal
Fix the AOT crash on s[i] string indexing. The type checker and ARC lowering already support it. Only the LLVM codegen handler and runtime function are missing.
Bug Analysis
Reproducer: Any .ori file with s[i] compiled via ori build
Crash trace:
- Type checker approves
str[int]→ emits__indexprotocol call - ARC lowering (
lower_index()inori_arc/src/lower/collections/mod.rs:160-188) emits__index(receiver, index)correctly - LLVM codegen reaches
try_emit_protocol()inapply_protocols.rs:79-103 match &type_infohasTypeInfo::ListandTypeInfo::Mapbut noTypeInfo::Strcase- Falls to
_ =>wildcard → returnsNone→ variable never defined - Downstream code tries to use undefined variable → index-out-of-bounds panic
The fix follows the exact pattern of emit_list_index() — the code journey J10 confirms this pattern scores 10/10.
Implementation
Step 1: Add ori_str_index Runtime Function
File: compiler/ori_rt/src/string/ops.rs
Add a function that indexes a string by codepoint position and returns a single-codepoint string:
/// Index a string by codepoint position.
///
/// Returns a new OriStr containing the single codepoint at position `index`.
/// Panics if `index < 0` or `index >= codepoint_count`.
///
/// # Safety
/// `out_ptr` must point to a valid OriStr-sized allocation.
#[no_mangle]
pub extern "C-unwind" fn ori_str_index(
str_ptr: *const u8,
str_len: i64,
index: i64,
out_ptr: *mut u8,
) {
// 1. Bounds check
// 2. Walk UTF-8 codepoints to find the nth one
// 3. Extract the codepoint bytes
// 4. Write an SSO OriStr containing just that codepoint to out_ptr
}
Spec reference: s[i] returns a single-codepoint str, not a char. The indexing is by codepoint position, not byte offset. (Spec: Clause 7 — Indexing)
UTF-8 walk: Use str_ptr[0..str_len] as a byte slice, iterate codepoints counting until reaching index, extract the codepoint bytes. This is O(n) but correct.
Step 2: Declare Runtime Function in LLVM Codegen
File: compiler/ori_llvm/src/codegen/runtime_decl/mod.rs (or the runtime function registry)
Register ori_str_index with signature:
- Params:
ptr(str data),i64(str len),i64(index),ptr(out_ptr) - Returns:
void - Attributes:
nounwind(panics viaC-unwind, but from LLVM’s perspective it may unwind)
Actually — check whether ori_str_index should panic via ori_panic_cstr (which is C-unwind) or via Rust panic. Follow the pattern of ori_list_get for consistency.
Step 3: Add TypeInfo::Str Handler in Protocol Dispatch
File: compiler/ori_llvm/src/codegen/arc_emitter/apply_protocols.rs
At line 92, add a TypeInfo::Str arm alongside TypeInfo::List and TypeInfo::Map:
match &type_info {
TypeInfo::List { element } => self.emit_list_index(recv, idx, *element),
TypeInfo::Map { key, value } => self.emit_map_get(recv, idx, *key, *value),
TypeInfo::Str => self.emit_str_index(recv, idx), // NEW
_ => {
tracing::warn!(?type_info, "__index on unsupported type");
None
}
}
Step 4: Implement emit_str_index
File: compiler/ori_llvm/src/codegen/arc_emitter/builtins/collections/ (new function, or in an existing string builtins file)
Follow the emit_list_index pattern:
- Extract data pointer from
OriStr(handle SSO vs heap — useori_str_data()runtime call) - Extract length (use
ori_str_len()runtime call) - Allocate stack space for the output
OriStr(24 bytes) - Call
ori_str_index(data, len, index, out_ptr) - Load the result
OriStrfrom the stack - Return it as a
ValueId
SSO consideration: The result is always a 1-4 byte codepoint, which always fits in SSO. The runtime function writes an SSO OriStr directly — no heap allocation needed for the result.
Test Strategy
Matrix Dimensions
Type dimension: str (only type being fixed)
Pattern dimension:
- ASCII indexing:
"hello"[0]→"h" - ASCII last:
"hello"[4]→"o" - Multibyte codepoint:
"héllo"[1]→"é" - OOB panic:
"hello"[5]→ panic - Negative index:
"hello"[-1]→ panic (or#syntax if supported) - Empty string:
""[0]→ panic - Single char:
"x"[0]→"x"
Semantic Pin Tests
- Pin: ASCII indexing returns correct single-char string:
"hello"[1] == "e"→ true - Pin: UTF-8 indexing counts codepoints not bytes:
"héllo"[1] == "é"→ true - Pin: OOB panics:
"hello"[10]→ runtime panic with bounds message
TDD Ordering
- Write failing AOT test:
s[0]on a string literal → currently crashes - Write UTF-8 test:
"héllo"[1]→ currently crashes - Write OOB test:
"hello"[5]→ should panic cleanly (not crash) - Implement
ori_str_indexinori_rt - Declare runtime function in LLVM codegen
- Add
TypeInfo::Strarm inapply_protocols.rs - Implement
emit_str_index - Verify all tests pass in debug AND release
- Re-run bench_string.ori → should compile and run
Test Files
compiler/ori_llvm/tests/aot/strings.rs— add string indexing AOT teststests/spec/expressions/index_access.ori— existing spec tests (lines 206-228) should now pass in LLVM backend
§02.R Third Party Review Findings
- None.
Completion Checklist
-
ori_str_indexruntime function implemented and tested - Runtime function declared in LLVM codegen
-
TypeInfo::Strarm added toapply_protocols.rs -
emit_str_indeximplemented followingemit_list_indexpattern - ASCII, UTF-8, and OOB tests passing
-
bench_string.oricompiles and runs correctly -
./test-all.shpasses in debug and release - No regressions in existing string AOT tests
-
/tpr-reviewpassed — independent Codex review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — implementation hygiene review clean (phase boundaries, SSOT, algorithmic DRY, naming). MUST run AFTER/tpr-reviewis clean. -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW.