Section 02: LLVM Metadata Infrastructure
Status: Not Started
Goal: Build the metadata emission layer that Sections 03 (AIMS export) and 04 (TBAA/range) need to attach optimization-relevant metadata to LLVM instructions. Currently, ori_llvm emits LLVM attributes extensively (noalias, readonly, nonnull, dereferenceable, memory()) but has zero metadata emission — no !tbaa, no !range, no !invariant.load. inkwell 0.8 may not expose MDBuilder directly, but llvm-sys 211 (already a dependency) provides the raw C API.
Success Criteria:
-
IrBuilderhasset_tbaa_metadata,set_range_metadata,set_invariant_load_metadatamethods - Metadata nodes are constructed via llvm-sys FFI (or inkwell if supported)
-
ArcInstr::Projectcarriesstruct_ty: Idxin addition to existingty: Idx(result type) - A smoke test constructs TBAA metadata, attaches it to a load, and
ORI_DUMP_AFTER_LLVM=1shows!tbaain the output -
./test-all.shgreen — metadata attachment does not affect existing behavior
Context: LLVM’s optimizer uses metadata to make better decisions. !tbaa tells the alias analysis that loads/stores through different struct field types cannot alias. !range tells value tracking that a loaded integer is within a known range. !invariant.load tells the optimizer that a loaded value never changes. All three are attached to individual load/store instructions via LLVMSetMetadata(). The infrastructure needed is: (1) a way to create metadata nodes (MDNode/MDString), (2) a way to attach them to instructions, and (3) type information available at the attachment site.
Reference implementations:
- LLVM C API
llvm-c/Core.h:LLVMSetMetadata(Val, KindID, Node),LLVMMDString(),LLVMMDNode() - Rust llvm-sys
core.rs: Already used inori_llvmforLLVMRunPasses,LLVMPassBuilderOptionsCreate, debug info builders
Depends on: Nothing (can start immediately).
02.1 llvm-sys Metadata Wrapper
File(s): compiler/ori_llvm/src/codegen/ir_builder/metadata.rs (NEW)
Create a thin Rust wrapper around llvm-sys metadata functions. This isolates unsafe FFI in one module.
-
Create
compiler/ori_llvm/src/codegen/ir_builder/metadata.rs -
Implement core metadata construction functions:
use llvm_sys::core::*; use llvm_sys::prelude::*; /// Metadata kind IDs (cached from LLVMGetMDKindID) pub struct MetadataKinds { pub tbaa: u32, pub range: u32, pub invariant_load: u32, pub alias_scope: u32, pub noalias: u32, } impl MetadataKinds { pub fn new(context: LLVMContextRef) -> Self { unsafe { Self { tbaa: LLVMGetMDKindIDInContext(context, b"tbaa\0".as_ptr().cast(), 4), range: LLVMGetMDKindIDInContext(context, b"range\0".as_ptr().cast(), 5), invariant_load: LLVMGetMDKindIDInContext(context, b"invariant.load\0".as_ptr().cast(), 14), alias_scope: LLVMGetMDKindIDInContext(context, b"alias.scope\0".as_ptr().cast(), 11), noalias: LLVMGetMDKindIDInContext(context, b"noalias\0".as_ptr().cast(), 7), } } } } /// Attach metadata to an LLVM instruction. pub unsafe fn set_metadata(instr: LLVMValueRef, kind_id: u32, md_node: LLVMValueRef) { LLVMSetMetadata(instr, kind_id, md_node); } /// Create an empty metadata node (for !invariant.load). pub unsafe fn empty_md_node(context: LLVMContextRef) -> LLVMValueRef { LLVMMDNodeInContext(context, std::ptr::null_mut(), 0) } /// Create a range metadata node: !{i64 lo, i64 hi} (value in [lo, hi)). pub unsafe fn range_md_node(context: LLVMContextRef, lo: i64, hi: i64) -> LLVMValueRef { let i64_ty = LLVMInt64TypeInContext(context); let lo_val = LLVMConstInt(i64_ty, lo as u64, 1); let hi_val = LLVMConstInt(i64_ty, hi as u64, 1); let mut vals = [lo_val, hi_val]; LLVMMDNodeInContext(context, vals.as_mut_ptr(), 2) } -
Add
mod metadata;tocompiler/ori_llvm/src/codegen/ir_builder/mod.rs -
Add TBAA root and struct-field node constructors (TBAA tree structure)
-
Write unit tests in
compiler/ori_llvm/src/codegen/ir_builder/metadata/tests.rs:- Test
MetadataKinds::new()returns non-zero kind IDs - Test
range_md_nodecreates a valid 2-element node - Test
empty_md_nodecreates a valid 0-element node
- Test
-
Verify:
timeout 150 cargo t -p ori_llvmpasses -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (02.1) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.1 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.1: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.2 IrBuilder Metadata Helpers
File(s): compiler/ori_llvm/src/codegen/ir_builder/metadata.rs (extends the NEW file from 02.1), compiler/ori_llvm/src/codegen/ir_builder/mod.rs (411 lines — IrBuilder struct definition)
Add metadata attachment methods to IrBuilder so callers at load()/store() sites can attach metadata without touching llvm-sys directly. The IrBuilder already wraps inkwell’s builder — these methods extend it with metadata capabilities. Methods go in metadata.rs (keeping the metadata module self-contained) while the MetadataKinds field is added to the IrBuilder struct in mod.rs.
-
Add
metadata_kinds: MetadataKindsfield toIrBuilderinmod.rs, initialized duringIrBuilder::new()(requiresLLVMContextRef— obtain viainkwell::Context::as_mut_ptr()or equivalent) -
In
metadata.rs, addimpl IrBuilderblock with metadata attachment methods:set_load_tbaa(&self, load_val: ValueId, tbaa_node: LLVMValueRef)— attaches!tbaato a load instructionset_load_range(&self, load_val: ValueId, lo: i64, hi: i64)— attaches!rangeto a loadset_load_invariant(&self, load_val: ValueId)— attaches!invariant.loadto a loadset_instruction_alias_scope(&self, instr: ValueId, scope_node: LLVMValueRef)— attaches!alias.scopeset_instruction_noalias(&self, instr: ValueId, scope_node: LLVMValueRef)— attaches!noalias
-
Each method must: (a) resolve
ValueIdto rawLLVMValueRefvia the value arena, (b) call the corresponding metadata wrapper function from 02.1 -
Write tests in
metadata/tests.rsverifying metadata attachment doesn’t crash and appears in IR dump -
Verify:
timeout 150 cargo t -p ori_llvmpasses -
Verify:
mod.rsstays under 500 lines (currently 411 + 1 field + 1 init line = ~413) -
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. -
Subsection close-out (02.2) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.2 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.2: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. -
Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.3 Extend ArcInstr::Project with Struct Type
File(s): compiler/ori_arc/src/ir/instr.rs
Currently ArcInstr::Project { dst, ty, value, field } (instr.rs lines 58-64) stores ty (the result type of the projection) but not the containing struct’s type. To emit TBAA metadata at struct_gep sites, the codegen needs to know “this is field 2 of struct Point” — not just “this loads an int.” Add struct_ty: Idx to carry the source struct’s type through the pipeline.
- Add
struct_ty: Idxfield toArcInstr::Projectvariant incompiler/ori_arc/src/ir/instr.rs - Update all construction sites of
ArcInstr::Projectincompiler/ori_arc/src/lower/:lower/expr/mod.rs— struct field access lowering (populatestruct_tyfrom the source expression’s type — note: the source isvalue: ArcVarId, so look upfunc.var_types[value]for the struct type)lower/expr/collections.rs— collection field access (if applicable)lower/builder/emission.rs— builder emission (has Project construction)- Search:
grep -rn "Project {" compiler/ori_arc/src/to find all construction sites (count may have changed — re-verify at implementation time)
- Update all pattern matches on
ArcInstr::Project— addstruct_tyto destructuring:compiler/ori_arc/src/aims/(transfer functions, emission)compiler/ori_arc/src/(verification, liveness, drop analysis)compiler/ori_llvm/src/codegen/arc_emitter/instr_dispatch.rs(LLVM emission)- Search:
grep -rn "Project {" compiler/to find all match sites
- For non-struct projections (tuple, enum payload), set
struct_tyto the source aggregate’s type — TBAA will use it for type disambiguation regardless - Write test: create an
ArcFunctionwith aProjectinstruction, verifystruct_tyis populated correctly - Verify:
timeout 150 cargo tpasses (all crates)
Sync points: ArcInstr::Project is matched in ~96 locations across ~30 files in ori_arc and ori_llvm (verified via grep -c "Project {" compiler/ori_arc/ compiler/ori_llvm/). Most are destructuring matches that will need struct_ty added. The bulk are in ori_arc (~82 occurrences in 24 files) with ori_llvm contributing ~14 in 6 files. Use grep -rn "Project {" compiler/ori_arc/ compiler/ori_llvm/ to find every site. Consider a staged approach: (1) add field with _struct_ty in all match sites first to compile, (2) populate meaningfully at construction sites, (3) use at consumption sites (Section 04).
This section provides to later sections: Section 04 uses struct_ty at struct_gep emission sites to construct TBAA metadata nodes identifying “field N of struct S.”
Matrix dimensions: Not applicable (structural refactor, no behavioral change).
-
/tpr-reviewpassed — independent review found no critical or major issues (or all findings triaged) -
/impl-hygiene-reviewpassed — hygiene review clean. MUST run AFTER/tpr-reviewis clean. - Subsection close-out (02.3) — MANDATORY before starting the next subsection. Run
/improve-toolingretrospectively on THIS subsection’s debugging journey (per.claude/skills/improve-tooling/SKILL.md“Per-Subsection Workflow”): whichdiagnostics/scripts you ran, where you addeddbg!/tracingcalls, where output was hard to interpret, where test failures gave unhelpful messages, where you ran the same command sequence repeatedly. Forward-look: what tool/log/diagnostic would shorten the next regression in this code path by 10 minutes? Implement improvements NOW (zero deferral) and commit each via SEPARATE/commit-pushusing a valid conventional-commit type (build(diagnostics): ... — surfaced by section-02.3 retrospective—build/test/chore/ci/docsare valid;tools(...)is rejected by the lefthook commit-msg hook). Mandatory even when nothing felt painful. If genuinely no gaps, document briefly: “Retrospective 02.3: no tooling gaps”. Update this subsection’sstatusin section frontmatter tocomplete. -
/sync-claudesection-close doc sync — verify Claude artifacts across all section commits. Map changed crates to rules files, check CLAUDE.md, canon.md. Fix drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
02.R Third Party Review Findings
- None.
02.N Completion Checklist
-
metadata.rsmodule exists withMetadataKinds, TBAA/range/invariant constructors -
IrBuilderhas 5+ metadata attachment methods -
ArcInstr::Projectcarriesstruct_ty: Idx - All
Projectconstruction/match sites updated (grep returns 0 mismatches) - Smoke test: metadata appears in
ORI_DUMP_AFTER_LLVM=1output -
timeout 150 ./test-all.shgreen (debug AND release —cargo b --release && timeout 150 ./test-all.sh) - No spurious warnings
- Plan annotation cleanup:
bash .claude/skills/impl-hygiene-review/plan-annotations.sh --plan 02returns 0 - Plan sync — update plan metadata
- This section
status→complete -
00-overview.mdQuick Reference updated -
index.mdstatus updated - Sections 03/04
depends_onverified
- This section
-
/tpr-reviewpassed -
/impl-hygiene-reviewpassed -
/improve-toolingretrospective completed — MANDATORY at section close, after both reviews are clean. Reflect on the section’s debugging journey (whichdiagnostics/scripts you ran, which command sequences you repeated, where you added ad-hocdbg!/tracingcalls, where output was hard to interpret) and identify any tool/log/diagnostic improvement that would have made this section materially easier OR that would help the next section touching this area. Implement every accepted improvement NOW (zero deferral) and commit each via SEPARATE/commit-push. The retrospective is mandatory even when nothing felt painful — that is exactly when blind spots accumulate. See.claude/skills/improve-tooling/SKILL.md“Retrospective Mode” for the full protocol.
Exit Criteria: IrBuilder can attach TBAA, range, invariant.load, alias.scope, and noalias metadata to instructions. ArcInstr::Project carries struct type information. All existing tests pass unchanged in both debug and release builds. A smoke test demonstrates metadata in the IR dump.