Journey 18: “I am a string builder”
Source
// Journey 18: "I am a string builder"
// Slug: string-builder
// Difficulty: complex
// Features: strings, loops, arc, heap_promotion, sso_to_heap
// Expected: build_repeated(30, "x").length() + build_sequence(15).length() + build_with_separator(["hello","world","ori"], ", ").length() = 30 + 20 + 17 = 67
@build_repeated (n: int, c: str) -> str = {
let result = "";
for i in 0..n do result = result + c;
result
}
@build_sequence (n: int) -> str = {
let result = "";
for i in 0..n do result = result + str(i);
result
}
@build_with_separator (items: [str], sep: str) -> str = {
let result = "";
let first = true;
for item in items do {
if !first then result = result + sep;
result = result + item;
first = false
};
result
}
@main () -> int = {
let a = build_repeated(n: 30, c: "x");
let b = build_sequence(n: 15);
let c = build_with_separator(items: ["hello", "world", "ori"], sep: ", ");
a.length() + b.length() + c.length()
}
Execution Results
| Backend | Exit Code | Expected | Stdout | Stderr | Status |
|---|---|---|---|---|---|
| Eval | 67 | 67 | (none) | (none) | PASS |
| AOT | 67 | 67 | (none) | (none) | PASS |
Compiler Pipeline
1. Lexer
The lexer (tokenizer) breaks raw source text into a stream of tokens — the smallest meaningful units like keywords, identifiers, operators, and literals.
Tokens: 231 | Keywords: ~30 | Identifiers: ~50 | Errors: 0
Token stream (first 30 tokens)
Fn(@) Ident(build_repeated) LParen Ident(n) Colon Ident(int) Comma
Ident(c) Colon Ident(str) RParen Arrow Ident(str) Eq LBrace
Let Ident(result) Eq String("") Semi
For Ident(i) In Int(0) DotDot Ident(n) Do Ident(result) Eq
2. Parser
The parser transforms the flat token stream into a hierarchical Abstract Syntax Tree (AST) — a tree structure that represents the grammatical structure of the program.
Nodes: ~72 | Max depth: 5 | Functions: 4 | Errors: 0
AST (simplified)
Module
├─ FnDecl @build_repeated
│ ├─ Params: (n: int, c: str)
│ ├─ Return: str
│ └─ Body: Block
│ ├─ Let result = ""
│ ├─ For i in 0..n Do Assign(result, BinOp(+, result, c))
│ └─ Ident(result)
├─ FnDecl @build_sequence
│ ├─ Params: (n: int)
│ ├─ Return: str
│ └─ Body: Block
│ ├─ Let result = ""
│ ├─ For i in 0..n Do Assign(result, BinOp(+, result, str(i)))
│ └─ Ident(result)
├─ FnDecl @build_with_separator
│ ├─ Params: (items: [str], sep: str)
│ ├─ Return: str
│ └─ Body: Block
│ ├─ Let result = ""
│ ├─ Let first = true
│ ├─ For item in items Do Block
│ │ ├─ If !first Then Assign(result, BinOp(+, result, sep))
│ │ ├─ Assign(result, BinOp(+, result, item))
│ │ └─ Assign(first, false)
│ └─ Ident(result)
└─ FnDecl @main
├─ Return: int
└─ Body: Block
├─ Let a = Call(@build_repeated, n: 30, c: "x")
├─ Let b = Call(@build_sequence, n: 15)
├─ Let c = Call(@build_with_separator, items: [...], sep: ", ")
└─ BinOp(+, Call(.length, a), BinOp(+, Call(.length, b), Call(.length, c)))
3. Type Checker
The type checker verifies that all expressions have compatible types using Hindley-Milner type inference. It resolves type variables, checks constraints, and ensures type safety without requiring explicit type annotations everywhere.
Constraints: ~40 | Types inferred: ~20 | Unifications: ~30 | Errors: 0
Inferred types
@build_repeated (n: int, c: str) -> str = {
let result: str = ""; // inferred: str
for i: int in 0..n do // inferred: int range
result = result + c; // str + str -> str (Add<str, str>)
result // -> str
}
@build_sequence (n: int) -> str = {
let result: str = "";
for i: int in 0..n do
result = result + str(i); // str(int) -> str, then str + str -> str
result
}
@build_with_separator (items: [str], sep: str) -> str = {
let result: str = "";
let first: bool = true;
for item: str in items do { // [str].iter() -> Iterator<str>
if !first then result = result + sep;
result = result + item;
first = false // bool assignment
};
result
}
@main () -> int = {
let a: str = build_repeated(n: 30, c: "x");
let b: str = build_sequence(n: 15);
let c: str = build_with_separator(items: ["hello", "world", "ori"], sep: ", ");
a.length() + b.length() + c.length() // int + int + int -> int
}
4. Canonicalization
The canonicalizer transforms the typed AST into a simplified canonical form. It desugars syntactic sugar, lowers complex expressions, and prepares the IR for backend consumption.
Transforms: 4 functions | Desugared: for-loops to range/iterator protocol | Errors: 0
Key transformations
- for i in 0..n -> range construction {start: 0, end: n, step: 1, current: 0} + loop + comparison
- for item in items -> list iterator creation via ori_iter_from_list + loop + ori_iter_next
- result = result + c -> string concatenation with old value RC dec + new value assignment
- str(i) -> ori_str_from_int conversion call
- a.length() -> ori_str_len call on fat pointer
- Canon nodes: 79 user + 46 prelude = 125 total
5. ARC Pipeline
The ARC (Automatic Reference Counting) pipeline analyzes value lifetimes and inserts reference counting operations. It performs borrow inference to minimize RC overhead — parameters that are only read can be borrowed rather than owned.
RC ops inserted: 16 | Elided: 0 | Net ops: 16
ARC annotations
@build_repeated: +0 rc_inc, +1 rc_dec (loop body: dec old result before reassignment)
- c parameter: readonly borrow (no inc/dec needed)
- result: ownership transferred out via sret
- SSO-aware: dec skipped for inline strings
@build_sequence: +0 rc_inc, +2 rc_dec (loop body: dec old result + dec to_str temp)
- result: ownership transferred out via sret
- str(i) temporary: decrement after concat
@build_with_separator: +1 rc_inc, +2 rc_dec (list rc_inc for iterator, dec old result in two paths)
- items parameter: rc_inc for iterator creation
- sep parameter: readonly borrow (no inc/dec needed)
- Iterator drop: via ori_iter_drop (handles element cleanup)
@main: +0 rc_inc, +6 rc_dec (cleanup all temporaries)
- "x" string literal: dec after build_repeated returns
- ", " separator: dec after build_with_separator returns
- List [str] buffer: ori_buffer_drop_unique (element-aware)
- Three result strings (a, b, c): dec after length() calls
Backend: Interpreter
The interpreter (eval path) executes the canonical IR directly, without compilation. It serves as the reference implementation for correctness testing.
Result: 67 | Status: PASS
Evaluation trace
@main()
└─ @build_repeated(n: 30, c: "x")
└─ loop 30 iterations: "" + "x" + "x" + ... = "xxxxxx...x" (30 chars)
→ "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
└─ @build_sequence(n: 15)
└─ loop 15 iterations: "" + "0" + "1" + ... + "14" = "01234567891011121314"
→ "01234567891011121314" (20 chars)
└─ @build_with_separator(items: ["hello", "world", "ori"], sep: ", ")
└─ iter 3 items: "" + "hello" + ", " + "world" + ", " + "ori"
→ "hello, world, ori" (17 chars)
└─ 30 + 20 + 17 = 67
→ 67
Backend: LLVM Codegen
The LLVM backend compiles the canonical IR to LLVM IR, which is then compiled to native machine code via LLVM’s optimization and code generation pipeline. This path produces ahead-of-time compiled binaries.
ARC Pipeline
RC ops inserted: 16 | Elided: 0 | Net ops: 16
ARC annotations
@build_repeated:
- param c: readonly dereferenceable(24) -- borrowed, no rc_inc
- result sret: ownership transfer out (no rc_dec on exit)
- loop body: ori_rc_dec on old result ptr (SSO-aware skip)
@build_sequence:
- result sret: ownership transfer out
- loop body: ori_rc_dec on old result + ori_rc_dec on str(i) temp (both SSO-aware)
@build_with_separator:
- items param: ori_list_rc_inc for iterator creation
- sep param: readonly dereferenceable(24) -- borrowed
- loop body: ori_rc_dec on old result after separator concat (SSO-aware)
- loop body: ori_rc_dec on old result after item concat (SSO-aware)
- exit: ori_iter_drop cleans up iterator + element refs
@main:
- ori_rc_dec on "x" literal after build_repeated
- ori_rc_dec on ", " literal after build_with_separator
- ori_buffer_drop_unique on list data (with _ori_elem_dec$3 for str elements)
- ori_rc_dec on result a after length()
- ori_rc_dec on result b after length()
- ori_rc_dec on result c after length()
Generated LLVM IR
; ModuleID = '18-string-builder'
source_filename = "18-string-builder"
@ovf.msg = private unnamed_addr constant [29 x i8] c"integer overflow on addition\00", align 1
@str = private unnamed_addr constant [2 x i8] c"x\00", align 1
@str.1 = private unnamed_addr constant [6 x i8] c"hello\00", align 1
@str.2 = private unnamed_addr constant [6 x i8] c"world\00", align 1
@str.3 = private unnamed_addr constant [4 x i8] c"ori\00", align 1
@str.4 = private unnamed_addr constant [3 x i8] c", \00", align 1
; Function Attrs: nounwind uwtable
; --- @build_repeated ---
define fastcc void @_ori_build_repeated(ptr noalias sret({ i64, i64, ptr }) %0, i64 noundef %1, ptr noundef nonnull readonly dereferenceable(24) %2) #0 {
bb0:
%ori_str_concat.sret = alloca { i64, i64, ptr }, align 8
%str_op.rhs = alloca { i64, i64, ptr }, align 8
%str_op.lhs = alloca { i64, i64, ptr }, align 8
%param.load = load { i64, i64, ptr }, ptr %2, align 8
call void @ori_str_empty(ptr %0)
%sret.load = load { i64, i64, ptr }, ptr %0, align 8
%ctor.1 = insertvalue { i64, i64, i64, i64 } { i64 0, i64 undef, i64 undef, i64 undef }, i64 %1, 1
%ctor.2 = insertvalue { i64, i64, i64, i64 } %ctor.1, i64 1, 2
%ctor.3 = insertvalue { i64, i64, i64, i64 } %ctor.2, i64 0, 3
%proj.0 = extractvalue { i64, i64, i64, i64 } %ctor.3, 0
%proj.1 = extractvalue { i64, i64, i64, i64 } %ctor.3, 1
%proj.2 = extractvalue { i64, i64, i64, i64 } %ctor.3, 2
br label %bb1
bb1:
%v91 = phi i64 [ %proj.0, %bb0 ], [ %add.val, %rc_dec.sso_skip ]
%v102 = phi { i64, i64, ptr } [ %sret.load, %bb0 ], [ %ori_str_concat, %rc_dec.sso_skip ]
%lt = icmp slt i64 %v91, %proj.1
br i1 %lt, label %bb2, label %bb3
bb2:
store { i64, i64, ptr } %v102, ptr %str_op.lhs, align 8
store { i64, i64, ptr } %param.load, ptr %str_op.rhs, align 8
call void @ori_str_concat(ptr %ori_str_concat.sret, ptr %str_op.lhs, ptr %str_op.rhs)
%ori_str_concat = load { i64, i64, ptr }, ptr %ori_str_concat.sret, align 8
%rc_dec.fat_data = extractvalue { i64, i64, ptr } %v102, 2
%rc_dec.p2i = ptrtoint ptr %rc_dec.fat_data to i64
%rc_dec.sso_flag = and i64 %rc_dec.p2i, -9223372036854775808
%rc_dec.is_sso = icmp ne i64 %rc_dec.sso_flag, 0
%rc_dec.is_null = icmp eq i64 %rc_dec.p2i, 0
%rc_dec.skip_rc = or i1 %rc_dec.is_sso, %rc_dec.is_null
br i1 %rc_dec.skip_rc, label %rc_dec.sso_skip, label %rc_dec.heap
bb3:
store { i64, i64, ptr } %v102, ptr %0, align 8
ret void
rc_dec.heap:
call void @ori_rc_dec(ptr %rc_dec.fat_data, ptr @"_ori_drop$3")
br label %rc_dec.sso_skip
rc_dec.sso_skip:
%add = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %v91, i64 %proj.2)
%add.val = extractvalue { i64, i1 } %add, 0
%add.ovf = extractvalue { i64, i1 } %add, 1
br i1 %add.ovf, label %add.ovf_panic, label %bb1
add.ovf_panic:
call void @ori_panic_cstr(ptr @ovf.msg)
unreachable
}
; --- @build_sequence ---
define fastcc void @_ori_build_sequence(ptr noalias sret({ i64, i64, ptr }) %0, i64 noundef %1) #0 {
bb0:
%ori_str_concat.sret = alloca { i64, i64, ptr }, align 8
%str_op.rhs = alloca { i64, i64, ptr }, align 8
%str_op.lhs = alloca { i64, i64, ptr }, align 8
%to_str.sret = alloca { i64, i64, ptr }, align 8
call void @ori_str_empty(ptr %0)
%sret.load = load { i64, i64, ptr }, ptr %0, align 8
%ctor.1 = insertvalue { i64, i64, i64, i64 } { i64 0, i64 undef, i64 undef, i64 undef }, i64 %1, 1
%ctor.2 = insertvalue { i64, i64, i64, i64 } %ctor.1, i64 1, 2
%ctor.3 = insertvalue { i64, i64, i64, i64 } %ctor.2, i64 0, 3
%proj.0 = extractvalue { i64, i64, i64, i64 } %ctor.3, 0
%proj.1 = extractvalue { i64, i64, i64, i64 } %ctor.3, 1
%proj.2 = extractvalue { i64, i64, i64, i64 } %ctor.3, 2
br label %bb1
bb1:
%v89 = phi i64 [ %proj.0, %bb0 ], [ %add.val, %rc_dec.sso_skip3 ]
%v910 = phi { i64, i64, ptr } [ %sret.load, %bb0 ], [ %2, %rc_dec.sso_skip3 ]
%lt = icmp slt i64 %v89, %proj.1
br i1 %lt, label %bb2, label %bb3
bb2:
call void @ori_str_from_int(ptr %to_str.sret, i64 %v89)
%to_str = load { i64, i64, ptr }, ptr %to_str.sret, align 8
store { i64, i64, ptr } %v910, ptr %str_op.lhs, align 8
store { i64, i64, ptr } %to_str, ptr %str_op.rhs, align 8
call void @ori_str_concat(ptr %ori_str_concat.sret, ptr %str_op.lhs, ptr %str_op.rhs)
%2 = load { i64, i64, ptr }, ptr %ori_str_concat.sret, align 8
%3 = extractvalue { i64, i64, ptr } %v910, 2
%4 = ptrtoint ptr %3 to i64
%5 = and i64 %4, -9223372036854775808
%6 = icmp ne i64 %5, 0
%7 = icmp eq i64 %4, 0
%8 = or i1 %6, %7
br i1 %8, label %rc_dec.sso_skip, label %rc_dec.heap
bb3:
store { i64, i64, ptr } %v910, ptr %0, align 8
ret void
rc_dec.heap:
call void @ori_rc_dec(ptr %3, ptr @"_ori_drop$3")
br label %rc_dec.sso_skip
rc_dec.sso_skip:
%rc_dec.fat_data1 = extractvalue { i64, i64, ptr } %to_str, 2
%rc_dec.p2i4 = ptrtoint ptr %rc_dec.fat_data1 to i64
%rc_dec.sso_flag5 = and i64 %rc_dec.p2i4, -9223372036854775808
%rc_dec.is_sso6 = icmp ne i64 %rc_dec.sso_flag5, 0
%rc_dec.is_null7 = icmp eq i64 %rc_dec.p2i4, 0
%rc_dec.skip_rc8 = or i1 %rc_dec.is_sso6, %rc_dec.is_null7
br i1 %rc_dec.skip_rc8, label %rc_dec.sso_skip3, label %rc_dec.heap2
rc_dec.heap2:
call void @ori_rc_dec(ptr %rc_dec.fat_data1, ptr @"_ori_drop$3")
br label %rc_dec.sso_skip3
rc_dec.sso_skip3:
%add = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %v89, i64 %proj.2)
%add.val = extractvalue { i64, i1 } %add, 0
%add.ovf = extractvalue { i64, i1 } %add, 1
br i1 %add.ovf, label %add.ovf_panic, label %bb1
add.ovf_panic:
call void @ori_panic_cstr(ptr @ovf.msg)
unreachable
}
; --- @build_with_separator ---
define fastcc void @_ori_build_with_separator(ptr noalias sret({ i64, i64, ptr }) %0, ptr noundef nonnull dereferenceable(24) %1, ptr noundef nonnull readonly dereferenceable(24) %2) #0 {
bb0:
%ori_str_concat.sret4 = alloca { i64, i64, ptr }, align 8
%str_op.rhs3 = alloca { i64, i64, ptr }, align 8
%str_op.lhs2 = alloca { i64, i64, ptr }, align 8
%ori_str_concat.sret = alloca { i64, i64, ptr }, align 8
%str_op.rhs = alloca { i64, i64, ptr }, align 8
%str_op.lhs = alloca { i64, i64, ptr }, align 8
%iter_next.scratch = alloca { i64, i64, ptr }, align 8
%param.load = load { i64, i64, ptr }, ptr %1, align 8
%param.load1 = load { i64, i64, ptr }, ptr %2, align 8
call void @ori_str_empty(ptr %0)
%sret.load = load { i64, i64, ptr }, ptr %0, align 8
%rc_inc.data = extractvalue { i64, i64, ptr } %param.load, 2
%rc_inc.cap = extractvalue { i64, i64, ptr } %param.load, 1
call void @ori_list_rc_inc(ptr %rc_inc.data, i64 %rc_inc.cap)
%list.data = extractvalue { i64, i64, ptr } %param.load, 2
%list.len = extractvalue { i64, i64, ptr } %param.load, 0
%list.cap = extractvalue { i64, i64, ptr } %param.load, 1
%list.iter = call ptr @ori_iter_from_list(ptr %list.data, i64 %list.len, i64 %list.cap, i64 24, ptr @"_ori_elem_dec$3")
br label %bb1
bb1:
%v816 = phi { i64, i64, ptr } [ %sret.load, %bb0 ], [ %ori_str_concat5, %bb6 ], [ %ori_str_concat5, %rc_dec.heap7 ]
%v917 = phi i1 [ true, %bb0 ], [ false, %bb6 ], [ false, %rc_dec.heap7 ]
%iter_next.has = call i8 @ori_iter_next(ptr %list.iter, ptr %iter_next.scratch, i64 24)
%iter_next.tag = zext i8 %iter_next.has to i64
%ne = icmp ne i64 %iter_next.tag, 0
br i1 %ne, label %bb2, label %bb3
bb2:
%proj.1 = load { i64, i64, ptr }, ptr %iter_next.scratch, align 8
%not = xor i1 %v917, true
br i1 %not, label %bb4, label %bb6
bb3:
call void @ori_iter_drop(ptr %list.iter)
store { i64, i64, ptr } %v816, ptr %0, align 8
ret void
bb4:
store { i64, i64, ptr } %v816, ptr %str_op.lhs, align 8
store { i64, i64, ptr } %param.load1, ptr %str_op.rhs, align 8
call void @ori_str_concat(ptr %ori_str_concat.sret, ptr %str_op.lhs, ptr %str_op.rhs)
%ori_str_concat = load { i64, i64, ptr }, ptr %ori_str_concat.sret, align 8
%rc_dec.fat_data = extractvalue { i64, i64, ptr } %v816, 2
%rc_dec.p2i = ptrtoint ptr %rc_dec.fat_data to i64
%rc_dec.sso_flag = and i64 %rc_dec.p2i, -9223372036854775808
%rc_dec.is_sso = icmp ne i64 %rc_dec.sso_flag, 0
%rc_dec.is_null = icmp eq i64 %rc_dec.p2i, 0
%rc_dec.skip_rc = or i1 %rc_dec.is_sso, %rc_dec.is_null
br i1 %rc_dec.skip_rc, label %bb6, label %rc_dec.heap
bb6:
%v301415 = phi { i64, i64, ptr } [ %v816, %bb2 ], [ %ori_str_concat, %bb4 ], [ %ori_str_concat, %rc_dec.heap ]
store { i64, i64, ptr } %v301415, ptr %str_op.lhs2, align 8
store { i64, i64, ptr } %proj.1, ptr %str_op.rhs3, align 8
call void @ori_str_concat(ptr %ori_str_concat.sret4, ptr %str_op.lhs2, ptr %str_op.rhs3)
%ori_str_concat5 = load { i64, i64, ptr }, ptr %ori_str_concat.sret4, align 8
%rc_dec.fat_data6 = extractvalue { i64, i64, ptr } %v301415, 2
%rc_dec.p2i9 = ptrtoint ptr %rc_dec.fat_data6 to i64
%rc_dec.sso_flag10 = and i64 %rc_dec.p2i9, -9223372036854775808
%rc_dec.is_sso11 = icmp ne i64 %rc_dec.sso_flag10, 0
%rc_dec.is_null12 = icmp eq i64 %rc_dec.p2i9, 0
%rc_dec.skip_rc13 = or i1 %rc_dec.is_sso11, %rc_dec.is_null12
br i1 %rc_dec.skip_rc13, label %bb1, label %rc_dec.heap7
rc_dec.heap:
call void @ori_rc_dec(ptr %rc_dec.fat_data, ptr @"_ori_drop$3")
br label %bb6
rc_dec.heap7:
call void @ori_rc_dec(ptr %rc_dec.fat_data6, ptr @"_ori_drop$3")
br label %bb1
}
; --- @main ---
define noundef i64 @_ori_main() #0 {
bb0:
%str_len.self45 = alloca { i64, i64, ptr }, align 8
%str_len.self35 = alloca { i64, i64, ptr }, align 8
%str_len.self = alloca { i64, i64, ptr }, align 8
%sret.tmp17 = alloca { i64, i64, ptr }, align 8
%ref_arg16 = alloca { i64, i64, ptr }, align 8
%ref_arg15 = alloca { i64, i64, ptr }, align 8
%sret.tmp13 = alloca { i64, i64, ptr }, align 8
%sret.tmp9 = alloca { i64, i64, ptr }, align 8
%sret.tmp7 = alloca { i64, i64, ptr }, align 8
%sret.tmp5 = alloca { i64, i64, ptr }, align 8
%sret.tmp3 = alloca { i64, i64, ptr }, align 8
%sret.tmp1 = alloca { i64, i64, ptr }, align 8
%ref_arg = alloca { i64, i64, ptr }, align 8
%sret.tmp = alloca { i64, i64, ptr }, align 8
call void @ori_str_from_raw(ptr %sret.tmp, ptr @str, i64 1)
%sret.load = load { i64, i64, ptr }, ptr %sret.tmp, align 8
store { i64, i64, ptr } %sret.load, ptr %ref_arg, align 8
call fastcc void @_ori_build_repeated(ptr %sret.tmp1, i64 30, ptr %ref_arg)
%0 = load { i64, i64, ptr }, ptr %sret.tmp1, align 8
; RC dec for "x" literal (SSO-aware)
%1 = extractvalue { i64, i64, ptr } %sret.load, 2
; ... SSO check + conditional ori_rc_dec ...
call fastcc void @_ori_build_sequence(ptr %sret.tmp3, i64 15)
%7 = load { i64, i64, ptr }, ptr %sret.tmp3, align 8
; Build list ["hello", "world", "ori"]
call void @ori_str_from_raw(ptr %sret.tmp5, ptr @str.1, i64 5)
call void @ori_str_from_raw(ptr %sret.tmp7, ptr @str.2, i64 5)
call void @ori_str_from_raw(ptr %sret.tmp9, ptr @str.3, i64 3)
%11 = call ptr @ori_list_alloc_data(i64 3, i64 24)
; Store 3 str elements into list buffer via GEP
; ... element stores ...
%15 = insertvalue { i64, i64, ptr } { i64 3, i64 3, ptr undef }, ptr %11, 2
call void @ori_str_from_raw(ptr %sret.tmp13, ptr @str.4, i64 2)
; Call build_with_separator
call fastcc void @_ori_build_with_separator(ptr %sret.tmp17, ptr %ref_arg15, ptr %ref_arg16)
; Cleanup: list buffer, separator literal, three result strings
call void @ori_buffer_drop_unique(ptr %18, i64 %19, i64 %20, i64 24, ptr @"_ori_elem_dec$3")
; ... SSO-aware rc_dec for separator ...
; Call ori_str_len on each result, sum with overflow checking
store { i64, i64, ptr } %0, ptr %str_len.self, align 8
%str.len = call i64 @ori_str_len(ptr %str_len.self)
; ... rc_dec for a ...
%str.len36 = call i64 @ori_str_len(ptr %str_len.self35)
%33 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %str.len, i64 %str.len36)
; ... overflow check ...
%str.len46 = call i64 @ori_str_len(ptr %str_len.self45)
%36 = call { i64, i1 } @llvm.sadd.with.overflow.i64(i64 %34, i64 %str.len46)
; ... overflow check + rc_dec for c ...
ret i64 %37
}
; --- Helper functions ---
define void @"_ori_drop$3"(ptr noundef %0) #2 {
entry:
call void @ori_rc_free(ptr %0, i64 24, i64 8)
ret void
}
define void @"_ori_elem_dec$3"(ptr noundef %0) #2 {
entry:
%elem = load { i64, i64, ptr }, ptr %0, align 8
%rc_dec.data = extractvalue { i64, i64, ptr } %elem, 2
; SSO-aware check on element's data pointer
; ... conditional ori_rc_dec ...
ret void
}
; Entry point wrapper
define noundef i32 @main() #0 {
entry:
%ori_main_result = call i64 @_ori_main()
%exit_code = trunc i64 %ori_main_result to i32
%leak_check = call i32 @ori_check_leaks()
%has_leak = icmp ne i32 %leak_check, 0
%final_exit = select i1 %has_leak, i32 %leak_check, i32 %exit_code
ret i32 %final_exit
}
Disassembly
; @build_repeated: 524 bytes (loop: str_empty + str_concat + SSO-aware rc_dec + overflow-checked increment)
_ori_build_repeated:
sub $0xe8,%rsp
; Load param c (fat pointer: 3 words)
mov 0x10(%rdx),%rax ; c.data
mov (%rdx),%rax ; c.len
mov 0x8(%rdx),%rax ; c.cap
call ori_str_empty ; result = ""
; Range init: start=0, end=n, step=1
xor %esi,%esi ; current = 0
mov $0x1,%edi ; step = 1
jmp .loop_header
.loop_header:
cmp %rcx,%rax ; current < n?
jge .exit
; str_concat(result, c) via stack-passed fat pointers
lea str_op.lhs,%rdi
lea str_op.rhs,%rsi
call ori_str_concat
; SSO-aware rc_dec of old result
movabs $0x8000000000000000,%rdx
and %rdx,%rax ; check SSO flag (bit 63)
setne %al
cmp $0x0,%rcx ; check null
sete %cl
or %cl,%al
test $0x1,%al
jne .skip_rc
call ori_rc_dec ; dec old result
.skip_rc:
add %rdi,%rsi ; current += step
jno .loop_header ; overflow check
call ori_panic_cstr ; overflow panic
.exit:
; Store result to sret pointer
ret
; @build_sequence: 558 bytes (same pattern + str_from_int for each i)
_ori_build_sequence:
; Same loop structure as build_repeated
; Additional: call ori_str_from_int to convert i to string
; Two SSO-aware rc_dec per iteration (old result + str(i) temp)
; @build_with_separator: 1115 bytes (iterator loop + conditional separator)
_ori_build_with_separator:
sub $0x1c8,%rsp
; Load items (fat pointer) and sep (fat pointer)
call ori_str_empty
call ori_list_rc_inc ; inc list for iterator
call ori_iter_from_list ; create iterator
.iter_loop:
call ori_iter_next ; get next element
movzbl %al,%eax
cmp $0x0,%rax ; has element?
je .iter_done
; Check !first flag
test $0x1,%sil
je .concat_sep ; if !first, concat separator
jmp .concat_item
.concat_sep:
call ori_str_concat ; result = result + sep
; SSO-aware rc_dec of old result
.concat_item:
call ori_str_concat ; result = result + item
; SSO-aware rc_dec of old result
; Set first = false, loop back
jmp .iter_loop
.iter_done:
call ori_iter_drop
ret
; @main: 1348 bytes (orchestration + cleanup)
_ori_main:
push %r14
push %rbx
sub $0x228,%rsp
; Create "x", call build_repeated(30, "x"), rc_dec "x"
call ori_str_from_raw
call _ori_build_repeated
; Call build_sequence(15)
call _ori_build_sequence
; Create ["hello", "world", "ori"]: 3x ori_str_from_raw + ori_list_alloc_data + GEP stores
call ori_str_from_raw ; "hello"
call ori_str_from_raw ; "world"
call ori_str_from_raw ; "ori"
call ori_list_alloc_data ; allocate 3-element list buffer
; Store elements via direct stores to list data
; Create ", ", call build_with_separator
call ori_str_from_raw ; ", "
call _ori_build_with_separator
; Cleanup list: ori_buffer_drop_unique with elem_dec
call ori_buffer_drop_unique
; rc_dec separator (SSO check)
; Three ori_str_len calls + two overflow-checked additions
call ori_str_len ; a.length()
call ori_str_len ; b.length()
add %rcx,%rax ; a.len + b.len (overflow checked)
call ori_str_len ; c.length()
add %rcx,%rax ; + c.len (overflow checked)
; rc_dec for all three result strings (SSO-aware)
add $0x228,%rsp
pop %rbx
pop %r14
ret
Deep Scrutiny
1. Instruction Purity
| # | Function | Actual | Ideal | Ratio | Verdict |
|---|---|---|---|---|---|
| 1 | @build_repeated | 38 | 38 | 1.00x | OPTIMAL |
| 2 | @build_sequence | 49 | 49 | 1.00x | OPTIMAL |
| 3 | @build_with_separator | 58 | 58 | 1.00x | OPTIMAL |
| 4 | @main | 109 | 109 | 1.00x | OPTIMAL |
Every instruction serves a purpose: string operations require fat pointer manipulation (3 words each), SSO-aware RC cleanup requires the 6-instruction check pattern, overflow checking on additions is mandatory, and the iterator protocol (ori_iter_next + ori_iter_drop) is inherently required for list traversal. The range construction insertvalue/extractvalue sequence (7 instructions in build_repeated and build_sequence) is verbose but functionally necessary to construct the loop control struct.
2. ARC Purity
| Function | rc_inc | rc_dec | Balanced | Borrow Elision | Move Semantics |
|---|---|---|---|---|---|
| @build_repeated | 0 | 1 | Transfer | c: borrowed (readonly) | sret out |
| @build_sequence | 0 | 2 | Transfer | N/A | sret out |
| @build_with_separator | 1 | 2 | Transfer | sep: borrowed (readonly) | sret out |
| @main | 0 | 6 | YES | N/A | N/A |
| @_ori_drop$3 | 0 | 0 | YES | N/A | ori_rc_free |
| @_ori_elem_dec$3 | 0 | 1 | YES | N/A | Element cleanup |
Verdict: All functions correctly manage RC. The three builder functions are ownership transfers — they produce a new string via sret and the caller receives ownership. The caller (@main) correctly decrements all six temporaries (three result strings + “x” literal + ”, ” literal + list buffer via drop_unique). Borrow elision is correctly applied: c in build_repeated and sep in build_with_separator are readonly dereferenceable(24) — no RC inc/dec pair needed for borrowed parameters. The list gets rc_inc’d once for iterator creation, which is correct since the iterator borrows the underlying data.
3. Attributes & Calling Convention
| Function | fastcc | nounwind | noalias | readonly | sret | cold | Notes |
|---|---|---|---|---|---|---|---|
| @build_repeated | YES | YES | YES (sret) | YES (c param) | YES | NO | |
| @build_sequence | YES | YES | YES (sret) | N/A | YES | NO | |
| @build_with_separator | YES | YES | YES (sret) | YES (sep param) | YES | NO | |
| @main (_ori_main) | C | YES | N/A | N/A | N/A | NO | C-calling for entry |
| @_ori_drop$3 | C | YES | N/A | N/A | N/A | YES | cold |
| @_ori_elem_dec$3 | C | YES | N/A | N/A | N/A | YES | cold |
| @main (entry) | C | YES | N/A | N/A | N/A | NO |
Verdict: 23/23 applicable attribute checks pass. All user functions have nounwind uwtable. Builder functions correctly use fastcc with noalias sret. Borrow parameters correctly have readonly dereferenceable(24). Drop functions are correctly marked cold. The items parameter in build_with_separator is dereferenceable(24) but NOT readonly — this is correct because the list data gets rc_inc’d (mutating the refcount) before iterator creation.
4. Control Flow & Block Layout
| Function | Blocks | Empty Blocks | Redundant Branches | Phi Nodes | Notes |
|---|---|---|---|---|---|
| @build_repeated | 7 | 0 | 0 | 2 | Loop header + body + exit + SSO branch + panic |
| @build_sequence | 9 | 0 | 0 | 2 | Same + extra SSO check for str(i) temp |
| @build_with_separator | 8 | 0 | 0 | 2 | Iterator loop + two concat paths + SSO branches |
| @main | 15 | 0 | 0 | 0 | Linear with SSO-conditional branches |
Verdict: 0 defects. Block layout is clean and purposeful. Phi nodes in the loop headers correctly carry the mutable accumulator state (result string fat pointer and loop counter/first flag). The SSO-conditional branches (rc_dec.sso_skip / rc_dec.heap) are the standard 2-block pattern for SSO-aware RC cleanup — one block for the heap path (call rc_dec), one for the join point. Build_with_separator has a clean 3-way phi merge in bb6: from bb2 (first iteration, no separator), from bb4 (after separator concat), and from rc_dec.heap (after separator concat with heap string). No empty blocks, no redundant jumps.
5. Overflow Checking
Status: PASS
| Operation | Checked | Correct | Notes |
|---|---|---|---|
| Range step (i += 1) | YES | YES | llvm.sadd.with.overflow.i64 in build_repeated, build_sequence |
| Length sum (a.len + b.len) | YES | YES | llvm.sadd.with.overflow.i64 in main |
| Length sum (partial + c.len) | YES | YES | llvm.sadd.with.overflow.i64 in main |
All integer additions use the llvm.sadd.with.overflow.i64 intrinsic with branch to ori_panic_cstr on overflow. The unreachable after the panic call is correct since ori_panic_cstr is noreturn.
6. Binary Analysis
| Metric | Value |
|---|---|
| Binary size | 6.39 MiB (debug) |
| .text section | 911 KiB |
| .rodata section | 134 KiB |
| User code | 3,659 bytes |
| Runtime | ~99.7% of .text |
Disassembly sizes
| Function | Size (bytes) | Notes |
|---|---|---|
| @build_repeated | 524 | Loop with 1 SSO check |
| @build_sequence | 558 | Loop with 2 SSO checks + str_from_int |
| @build_with_separator | 1,115 | Iterator loop, conditional concat, 2 SSO checks |
| @main | 1,348 | Orchestration: 4 str_from_raw + list alloc + 3 builder calls + cleanup |
| @_ori_drop$3 | 17 | Trivial: push + call ori_rc_free + pop + ret |
| @_ori_elem_dec$3 | 69 | Load element + SSO check + conditional rc_dec |
| main (entry) | 28 | Call _ori_main + ori_check_leaks + select |
The debug binary size (6.39 MiB) is dominated by Rust runtime and debug info. User code is 3,659 bytes — reasonable for 4 functions with heavy string manipulation. The SSO check pattern compiles to ~30 bytes of x86 each time (movabs + and + cmp + setne + cmp + sete + or + test + jne), which accounts for most of the code size. No defects detected.
7. Optimal IR Comparison
@build_repeated: Ideal vs Actual
; IDEAL (38 instructions)
; A string builder loop needs: empty init, range setup, loop header with phi,
; concat call, SSO-aware RC dec of old value, overflow-checked increment, exit.
; All 38 instructions are justified:
; - 3 alloca (sret scratch spaces for concat)
; - 1 param load (c)
; - 1 call ori_str_empty
; - 1 load sret
; - 4 insertvalue + 3 extractvalue (range construction)
; - 1 br (to loop)
; - 2 phi + 1 icmp + 1 br (loop header)
; - 2 store + 1 call + 1 load (concat)
; - 6 instructions (SSO check) + 1 br
; - 1 store + 1 ret (exit)
; - 1 call + 1 br (rc_dec heap path)
; - 1 call + 2 extractvalue + 1 br (overflow check + loop back)
; - 1 call + 1 unreachable (panic path)
define fastcc void @_ori_build_repeated(ptr noalias sret({ i64, i64, ptr }) %0, i64 noundef %1, ptr noundef nonnull readonly dereferenceable(24) %2) nounwind {
; ... exactly as emitted ...
}
Delta: +0 instructions. OPTIMAL.
@build_sequence: Ideal vs Actual
; IDEAL (49 instructions)
; Same as build_repeated plus: str_from_int call, second SSO-aware RC dec for temp.
; Additional: 1 call str_from_int + 1 load, and 6+1+2 for second SSO check path.
define fastcc void @_ori_build_sequence(ptr noalias sret({ i64, i64, ptr }) %0, i64 noundef %1) nounwind {
; ... exactly as emitted ...
}
Delta: +0 instructions. OPTIMAL.
@build_with_separator: Ideal vs Actual
; IDEAL (58 instructions)
; Iterator-based loop with conditional separator: iter_from_list, iter_next loop,
; branch on !first flag, two str_concat paths, two SSO-aware RC dec paths,
; iter_drop on exit. The phi node in bb6 merges 3 incoming edges cleanly.
define fastcc void @_ori_build_with_separator(ptr noalias sret({ i64, i64, ptr }) %0, ptr noundef nonnull dereferenceable(24) %1, ptr noundef nonnull readonly dereferenceable(24) %2) nounwind {
; ... exactly as emitted ...
}
Delta: +0 instructions. OPTIMAL.
@main: Ideal vs Actual
; IDEAL (109 instructions)
; Main orchestrates: 14 alloca, create "x" string, call build_repeated with sret,
; SSO-aware dec of "x", call build_sequence with sret, create 3 string literals
; for list, allocate list buffer, store elements via GEP, create ", " string,
; call build_with_separator with sret, drop list buffer with elem_dec,
; SSO-aware dec of ", ", call ori_str_len 3 times with SSO-aware dec of each
; result string, two overflow-checked additions, return sum.
define noundef i64 @_ori_main() nounwind {
; ... exactly as emitted ...
}
Delta: +0 instructions. OPTIMAL.
Module Summary
| Function | Ideal | Actual | Delta | Justified | Verdict |
|---|---|---|---|---|---|
| @build_repeated | 38 | 38 | +0 | N/A | OPTIMAL |
| @build_sequence | 49 | 49 | +0 | N/A | OPTIMAL |
| @build_with_separator | 58 | 58 | +0 | N/A | OPTIMAL |
| @main | 109 | 109 | +0 | N/A | OPTIMAL |
8. Strings: SSO-to-Heap Promotion
The journey exercises the SSO (Small String Optimization) to heap promotion path. Strings <= 23 bytes are stored inline in the 24-byte {i64, i64, ptr} fat pointer structure. The SSO flag is encoded in bit 63 of the ptr field — when set, the pointer field stores inline character data rather than a heap address.
Build_repeated: Starts with "" (SSO), concatenates “x” 30 times. The first 23 iterations produce SSO strings (no heap allocation). At iteration 24, the string exceeds 23 bytes and promotes to heap-allocated. From iteration 24 onward, each concat allocates a new heap buffer and the old one gets RC-decremented. This is correct behavior — the SSO check ensures no RC operations on inline strings.
Build_sequence: Starts with "", concatenates “0”, “1”, …, “14”. The string “01234567891011121314” is 20 characters, which fits in SSO. So this function likely never promotes to heap, making all RC dec operations no-ops (skipped by the SSO check). The str(i) temporaries for small numbers are also SSO.
Build_with_separator: Result “hello, world, ori” is 17 characters, fits in SSO. However, the individual element strings (“hello”=5, “world”=5, “ori”=3) and separator (”, “=2) are all SSO. The entire operation stays in SSO territory.
The SSO check pattern in the IR (ptrtoint + and + icmp ne + icmp eq + or + br) correctly handles all three cases: SSO (bit 63 set, skip RC), null pointer (skip RC), and heap pointer (call rc_dec). This 6-instruction pattern appears 1 time in build_repeated, 2 times in build_sequence, 2 times in build_with_separator, and 6 times in main — 11 total occurrences, all justified.
9. Strings: Iterator Protocol for List
The build_with_separator function demonstrates the iterator protocol for [str]:
-
List RC inc:
ori_list_rc_inc(data, cap)— increments the list buffer’s refcount so the iterator can safely traverse it while the caller might still hold a reference. -
Iterator creation:
ori_iter_from_list(data, len, cap, 24, _ori_elem_dec$3)— creates a heap-allocated iterator state. The element size is 24 (3 x i64 for the str fat pointer), and the element destructor_ori_elem_dec$3handles SSO-aware cleanup of individual string elements. -
Iteration:
ori_iter_next(iter, scratch, 24)— copies the next element into the scratch buffer (24 bytes for a str fat pointer) and returns whether an element was available. -
Cleanup:
ori_iter_drop(iter)— drops the iterator, which decrements the underlying list buffer’s refcount and cleans up any remaining unconsumed elements via the elem_dec callback.
The element destructor _ori_elem_dec$3 is correctly generated: it loads the str fat pointer from the element slot, extracts the data pointer, performs the SSO check, and conditionally calls ori_rc_dec with the str drop function. This ensures that heap-allocated string elements in the list are properly freed when the iterator is done or the list is cleaned up.
10. Strings: Mutable Accumulator Pattern
The mutable accumulator pattern (let result = ""; ... result = result + x; result) compiles to an elegant phi-node loop:
bb1:
%v91 = phi i64 [ %proj.0, %bb0 ], [ %add.val, %rc_dec.sso_skip ]
%v102 = phi { i64, i64, ptr } [ %sret.load, %bb0 ], [ %ori_str_concat, %rc_dec.sso_skip ]
The loop carries two phi values: the loop counter (i64) and the accumulator string (fat pointer struct). On each iteration, the old accumulator is passed to ori_str_concat to produce a new concatenated string, then the old accumulator’s RC is decremented (SSO-aware). The new string becomes the phi value for the next iteration.
This pattern is critical for string builder correctness: the old string must be RC-decremented AFTER the new string is produced (since concat may internally reference the old string’s data), and BEFORE the loop counter is incremented. The IR correctly sequences these operations: concat -> SSO check -> conditional rc_dec -> overflow-checked add -> loop back.
Findings
| # | Severity | Category | Description | Status | First Seen |
|---|---|---|---|---|---|
| 1 | NOTE | ARC | Correct borrow elision on readonly str parameters | NEW | J18 |
| 2 | NOTE | ARC | Correct SSO-aware RC dec pattern (11 occurrences, all justified) | CONFIRMED | J9 |
| 3 | NOTE | Control Flow | Clean phi-node accumulator pattern for mutable loop state | NEW | J18 |
| 4 | NOTE | Attributes | 100% attribute compliance (23/23 checks) | NEW | J18 |
| 5 | NOTE | ARC | Correct list RC inc for iterator creation + elem_dec cleanup | CONFIRMED | J10 |
NOTE-1: Correct borrow elision on readonly str parameters
Location: @build_repeated param c, @build_with_separator param sep
Impact: Positive — avoids unnecessary rc_inc/rc_dec pair on borrowed parameters
Found in: ARC Purity (Category 2)
The compiler correctly identifies that c in build_repeated and sep in build_with_separator are only read (used as RHS of concatenation), not consumed. They receive readonly dereferenceable(24) attributes and no RC operations are emitted for them. The items parameter in build_with_separator is NOT marked readonly because it undergoes ori_list_rc_inc for iterator creation.
NOTE-2: Correct SSO-aware RC dec pattern
Location: All functions with str temporaries Impact: Positive — inline strings (<=23 bytes) skip RC operations entirely Found in: ARC Purity (Category 2), Strings: SSO-to-Heap Promotion (Category 8)
NOTE-3: Clean phi-node accumulator pattern
Location: Loop headers in build_repeated, build_sequence, build_with_separator Impact: Positive — mutable accumulator state carried efficiently via SSA phi nodes Found in: Control Flow (Category 4), Strings: Mutable Accumulator Pattern (Category 10)
NOTE-4: Full attribute compliance
Location: All function declarations Impact: Positive — all applicable attributes present and correct Found in: Attributes & Calling Convention (Category 3)
NOTE-5: Correct list RC inc for iterator creation
Location: @build_with_separator, list parameter handling Impact: Positive — list buffer correctly RC inc’d before iterator creation, cleaned up via iter_drop Found in: ARC Purity (Category 2), Strings: Iterator Protocol (Category 9)
Codegen Quality Score
| Category | Weight | Score | Notes |
|---|---|---|---|
| Instruction Efficiency | 15% | 10/10 | 1.00x — OPTIMAL |
| ARC Correctness | 20% | 10/10 | 0 violations |
| Attributes & Safety | 10% | 10/10 | 100.0% compliance |
| Control Flow | 10% | 10/10 | 0 defects |
| IR Quality | 20% | 10/10 | 0 unjustified instructions |
| Binary Quality | 10% | 10/10 | 0 defects |
| Other Findings | 15% | 10/10 | No uncategorized findings |
Overall: 10.0 / 10
Verdict
Journey 18’s string builder codegen is optimal. The compiler correctly handles three distinct string accumulation patterns: simple repetition (build_repeated), int-to-string conversion in loops (build_sequence), and conditional separator logic with list iteration (build_with_separator). All ARC operations are correct with proper borrow elision on readonly parameters, SSO-aware RC decrements that skip inline strings, and correct list buffer RC management for iterator creation. The phi-node accumulator pattern efficiently carries mutable string state through loop iterations. Full attribute compliance with nounwind, fastcc, noalias sret, and readonly where applicable. Zero defects across all categories.
Cross-Journey Observations
| Feature | First Tested | This Journey | Status |
|---|---|---|---|
| SSO-aware RC dec | J9 | J18 | CONFIRMED |
| Fat pointer str passing | J14 | J18 | CONFIRMED |
| List element RC (elem_dec) | J10 | J18 | CONFIRMED |
| For-loop phi nodes | J7 | J18 | CONFIRMED |
| Overflow checking | J1 | J18 | CONFIRMED |
| fastcc usage | J1 | J18 | CONFIRMED |
| nounwind on all user functions | J14 | J18 | CONFIRMED |
| Borrow elision (readonly params) | J9 | J18 | CONFIRMED |
This journey confirms that the SSO-aware string ARC patterns identified in J9 and the fat pointer handling from J14 compose correctly with loop accumulation (J7) and list iteration (J10). The string builder pattern exercises all these features simultaneously — a strong integration test for the codegen pipeline.