Section 07: Constant Deduplication
Status: Complete
Goal: Each unique string constant (e.g., "integer overflow on addition\00") is emitted as a single LLVM global, shared across all use sites. Zero duplicate constant strings in emitted IR.
Context: The codegen emits identical overflow message strings as separate globals for each overflow check site. J2 has 2 duplicates, J7 has 6, J9 has 7, J12 has 6. While LLVM’s linker may merge unnamed_addr constants at link time, the IR is unnecessarily verbose and the duplicate creation wastes module-level resources.
Journeys affected: J2, J3, J4, J6, J7, J8, J9, J10, J11, J12. (10 of 12 journeys — this is the single most pervasive finding.)
Reference implementations:
- Rust
rustc_codegen_llvm/common.rs: Usesconst_str()which interns all string constants — same string → same global. - LLVM itself:
unnamed_addrconstants with identical content are candidates for COMDAT folding at link time, but emitting one global from the start is strictly better.
07.1 String Constant Interning
File(s): compiler/ori_llvm/src/codegen/ir_builder/constants.rs, compiler/ori_llvm/src/codegen/ir_builder/checked_ops.rs
The duplicate globals came from build_global_string_ptr(), not const_string(). const_string() creates inline byte arrays; build_global_string_ptr() creates named global string pointers — the ones that duplicated. It’s called from:
compiler/ori_llvm/src/codegen/ir_builder/checked_ops.rs— overflow panic messages (extracted fromarithmetic.rs). Was the primary source of duplicates:emit_checked_binop()called the raw inkwellself.builder.build_global_string_ptr()directly. Fixed to useself.build_global_string_ptr()(the IrBuilder wrapper with dedup cache).compiler/ori_llvm/src/codegen/arc_emitter/value_emission.rs(line 45) — string literal emission (usesIrBuilderwrapper — benefits from cache automatically)compiler/ori_llvm/src/codegen/derive_codegen/string_helpers.rs(line 30) — derive codegen string construction (usesfc.builder_mut().build_global_string_ptr()which IS the IrBuilder wrapper — benefits from cache automatically)
Correction from original plan: Only 1 of 3 call sites bypassed the IrBuilder wrapper (arithmetic.rs), not 2. string_helpers.rs already went through the IrBuilder via FunctionCompiler::builder_mut().
Implementation: Added global_strings: FxHashMap<String, ValueId> to IrBuilder. build_global_string_ptr() checks cache by content before creating globals. New globals marked with unnamed_addr (Global) for linker-level COMDAT folding.
- Split
arithmetic.rsinto submodules BEFORE other §07 work (513 → 352 lines inarithmetic.rs+ 162 lines inchecked_ops.rs) - Add a
FxHashMap<String, ValueId>to the IR builder codegen state - Modify
build_global_string_ptr()inconstants.rsto check cache before creating globals - Refactor
emit_checked_binop()inchecked_ops.rsto useIrBuilder::build_global_string_ptr()instead of raw inkwellself.builder.build_global_string_ptr()— also refactored panic call to useself.call() -
derive_codegen/string_helpers.rsalready uses IrBuilder wrapper (viafc.builder_mut()) — no refactoring needed (plan was incorrect about bypass) - Cache key uses full byte content (the
value: &strparameter), not the display label (name) - Mark deduplicated globals with
unnamed_addr(Global) to enable linker-level COMDAT folding - Verify: J7 IR has exactly 1
"integer overflow on addition\00"global (was 6) - Verify: J9 IR has exactly 1 of each overflow message (was 7)
- Count: All 12 journeys now have exactly 1 global per unique overflow message — zero duplicates
- Unit test in IrBuilder:
global_string_ptr_dedup_same_content,global_string_ptr_different_content_distinct,global_string_ptr_unnamed_addr
07.1 Completion Checklist
- String constant cache implemented in IR builder (
global_strings: FxHashMap<String, ValueId>) -
build_global_string_ptr()deduplicates by content, not by name - Count of global definitions for each overflow string is 1 per module
- No duplicate
@.str.*globals with identical content - J7 IR has exactly 1
"integer overflow on addition\00"global - J9 IR has exactly 1 of each overflow message
- Deduplicated globals have
unnamed_addrfor linker-level folding - IR test: program with 3 overflow sites has 1 overflow message global (not 3) —
global_string_ptr_dedup_same_content -
./test-all.shgreen (12,067 tests, 0 failures) -
./clippy-all.shgreen - No regressions in
cargo test -p ori_llvm(428 tests, 3 new)
Section 07 Exit Criteria
For any program, ORI_DUMP_AFTER_LLVM=1 shows at most one global per unique string value. No duplicated string constants in emitted IR for any of the 12 code journeys. ✓ Verified.