Codegen Verification
Why Verify Generated Code?
A compiler’s correctness is measured not by what it accepts, but by what it produces. A type checker can reject ill-formed programs, and a parser can reject invalid syntax, but the final test of a code generator is whether the machine code it produces does the right thing. For a language with automatic reference counting, “the right thing” includes a particularly treacherous requirement: every allocation must be freed exactly once.
Reference counting bugs are among the hardest to diagnose in compiler development. A missing RcDec does not crash immediately — it produces a memory leak that accumulates silently. An extra RcDec does not fail at the point of the error — it corrupts the refcount of an allocation that may not be accessed again for thousands of instructions, leading to a use-after-free in an entirely different part of the program. A COW operation on a freed pointer does not segfault deterministically — it corrupts whatever happens to occupy that memory now, which might be a completely unrelated data structure.
These bugs survive testing because they depend on allocation patterns, execution order, and timing. A program might run correctly a thousand times and then crash on the thousand-and-first because the allocator happened to reuse memory in a different order. Traditional testing — write input, check output — is necessary but insufficient. What is needed is a way to verify the structure of the generated code, catching RC lifecycle violations before the code ever executes.
Classical Approaches to Generated Code Verification
LLVM’s built-in verifier checks structural invariants of LLVM IR: every instruction’s operands have correct types, every basic block has a terminator, every PHI node has entries for all predecessors. This catches codegen bugs that produce malformed IR (type mismatches, missing terminators) but knows nothing about language-level semantics like reference counting.
Sanitizers (AddressSanitizer, MemorySanitizer, LeakSanitizer) instrument the generated code to detect memory errors at runtime. They catch use-after-free, buffer overflows, and leaks, but they require actually executing the code with inputs that trigger the bug. They are reactive — they find bugs that happen, not bugs that could happen.
Static analysis of generated code walks the generated IR to verify language-level invariants without executing it. This is the approach taken by Ori’s codegen verification system. It operates on in-memory LLVM IR — the same Module, FunctionValue, and InstructionValue objects used during codegen — and checks RC lifecycle rules, COW sequencing constraints, and ABI conformance. The analysis runs during compilation itself, catching issues before the module reaches optimization or execution.
Model checking and formal verification (used in verified compilers like CompCert) prove that the generated code preserves source-level semantics for all possible inputs. This is the gold standard but requires enormous implementation effort and limits the optimizations that can be verified.
Where Ori Sits
Ori’s codegen verification is a lightweight static analysis that walks the generated LLVM IR looking for common patterns of RC bugs. It is not a full model checker — it uses a linear forward scan rather than full dataflow analysis, which means it can miss conditional RC paths. But it is designed to catch the bugs that actually appear in practice: straight-line RC sequences where an allocation is never decremented, where a COW function receives a freed pointer, or where a runtime function is called with the wrong number of arguments.
The analysis is opt-in (gated behind ORI_AUDIT_CODEGEN=1) and zero-cost when disabled — no data structures are allocated, no IR is walked, no functions are called.
What Makes Ori’s Verification Distinctive
In-Pipeline, Not Post-Hoc
Most code analysis tools operate on files — they read textual LLVM IR, parse it, and analyze the parsed representation. This introduces parsing fragility (the tool must handle LLVM IR syntax changes across versions) and adds a compilation step (emit IR to file, then analyze).
Ori’s verification operates on live inkwell IR objects — the same Module, FunctionValue, and InstructionValue types used during codegen. The audit runs inside the compilation pipeline, between code generation and optimization/emission. There is no IR serialization, no file I/O, no parsing. This makes the audit fast, precise, and immune to format changes.
Four Independent Checks
The verification system comprises four independent check modules that share a reporting infrastructure but have no data dependencies:
- RC Balance — tracks pointer states through the RC lifecycle
- COW Sequencing — validates Copy-on-Write operation ordering
- ABI Conformance — verifies calling convention correctness
- Safety Check Density — measures panic/assert coverage (informational)
Each module iterates over every function in the LLVM module (skipping declarations without bodies), applies the function filter if set, and appends findings to a shared AuditReport. Adding a new check module requires no changes to existing modules.
Graduated Severity
Findings are classified by severity:
| Severity | Meaning | Build Impact |
|---|---|---|
| Error | Definite correctness violation | Fails the build |
| Warning | Likely problem (may have false positives in edge cases) | Reported, does not fail |
| Note | Informational observation | Reported, no action needed |
Errors are reserved for findings that represent real violations in normal mode — the system is designed to never produce false positive errors in normal mode. Strict mode may produce false positives (by design) and uses errors more liberally.
The Four Checks
1. RC Balance
The RC balance checker tracks pointer states through the reference counting lifecycle. Every call to ori_rc_alloc records the result SSA name as Live. Subsequent operations transition the state:
flowchart LR
Alloc["ori_rc_alloc
→ Live"] --> COW["COW function
→ CowConsumed"]
Alloc --> Dec["ori_rc_dec
→ Decremented"]
COW --> Dec2["ori_rc_dec
⚠ RcDecAfterCow"]
classDef native fill:#5c3a1e,stroke:#f59e0b,color:#fef3c7
class Alloc,COW,Dec,Dec2 native
At function exit, any pointer still in Live state is flagged:
| Situation | Finding | Severity |
|---|---|---|
Pointer Live at function exit | RcLeak | Warning (Error in strict) |
ori_rc_dec on CowConsumed pointer | RcDecAfterCow | Warning |
ori_rc_dec on Decremented pointer | RcDoubleDec | Error (strict only) |
The tracker uses an FxHashMap<String, PtrState> keyed by SSA name. This is a linear forward scan, not full CFG dataflow — it handles straight-line RC patterns (which account for 95%+ of codegen output) but may miss conditional RC paths. By design it produces false negatives (misses), never false positives for Error-severity findings in normal mode.
2. COW Sequencing
The COW sequencing checker validates three ordering rules for Copy-on-Write operations:
Rule 1: No reuse after COW. After ori_list_push_cow(data_ptr, ...), any subsequent use of data_ptr (other than ori_rc_dec) is an error. The COW function may have reallocated the backing storage, invalidating the old pointer.
ori_list_push_cow(data, ...) // COW may reallocate
ori_list_len(data) // ERROR: data may be freed
Rule 2: COW output extraction. Structurally impossible to violate — COW functions return void and write to output allocas. No check needed.
Rule 3: No dec before COW. If ori_rc_dec(ptr) fires before ori_list_push_cow(ptr, ...), the COW function receives a potentially freed pointer.
ori_rc_dec(data) // Might free data
ori_list_push_cow(data, ...) // ERROR: data may be freed
The module tracks two sets per function: cow_consumed (pointers passed to COW functions) and decremented (pointers passed to ori_rc_dec). It walks all instructions, checking call and non-call operands against these sets.
3. ABI Conformance
Three independent sub-checks validate calling convention correctness:
Large aggregate loads. Detects load %StructType, ptr where the struct exceeds 16 bytes. These trigger LLVM FastISel bugs in JIT mode — the correct pattern is per-field struct_gep + load + insert_value. The 16-byte threshold is computed conservatively by estimated_type_size(), which sums field sizes without padding (real structs are only larger).
Runtime argument count mismatch. Checks calls to ori_* runtime functions against the RT_FUNCTIONS table — the single source of truth for runtime function signatures. If a call has a different operand count than the declared parameter count, it produces an error. This catches codegen bugs where the emitter passes the wrong number of arguments to a runtime function.
Nounwind + invoke conflict. Functions marked Nounwind in RT_FUNCTIONS should be called with call, not invoke (which generates unnecessary landing pads). This is a warning rather than an error because the extra landing pad is wasteful but not incorrect.
4. Safety Check Density
The safety density checker analyzes how many panic/assert checks appear relative to total instruction count. This is purely informational — all findings are Note severity.
The analysis uses a two-pass approach per function:
-
Pass 1: Identify “panic blocks” — basic blocks containing calls to safety functions (
ori_panic,ori_assert*, or any runtime function withAttr::Cold). -
Pass 2: Count safety calls and conditional branches targeting panic blocks. Compute density as
check_count / instruction_count * 100.
Unconditional branches to panic blocks are not counted — they represent inevitable panics (the code always panics there), not guard checks. The density metric helps identify functions that might benefit from more defensive checks or, conversely, functions where excessive checking inflates code size.
Strict Mode
Setting ORI_AUDIT_STRICT=1 activates pessimistic assumptions:
-
COW treated as freeing. COW consumption transitions the pointer directly to
Decrementedinstead ofCowConsumed. Any subsequentori_rc_decis a definiteRcDoubleDecerror. -
Parameters tracked as RC-managed. Function pointer parameters are recorded as
Liveat entry. If a function receives a pointer but never decrements it, strict mode flags anRcLeakerror. Normal mode ignores parameters (they may be borrowed). -
Leaks elevated to errors. RC leaks that are
Warningin normal mode becomeErrorin strict mode.
Strict mode may produce false positives by design. It is intended for focused investigation of suspected RC bugs, not routine CI use. When a program shows signs of memory corruption (intermittent crashes, growing memory usage), running with strict mode narrows the search to specific functions with suspicious RC patterns.
Function Filtering
Setting ORI_AUDIT_FUNCTION=name restricts the audit to functions whose LLVM name contains the substring name. This is useful for large programs where auditing every function would produce too much output:
ORI_AUDIT_CODEGEN=1 ORI_AUDIT_FUNCTION=process ori build file.ori
audits only functions whose name contains process. The filter applies across all four check modules.
Reporting
All findings are collected in an AuditReport:
codegen audit: warning: [_ori_process_list] RC pointer %3 still Live at function exit (potential leak)
codegen audit: error: [_ori_update_data] COW input %data reused after ori_list_push_cow
codegen audit: note: [_ori_main] 3 safety checks in 47 instructions (6.4% density)
codegen audit summary: 1 error(s), 1 warning(s), 1 note(s)
All strings are owned (String, not &str) because the report must outlive inkwell’s LLVM context — a subtlety of LLVM’s memory management that leaks through to any tool that operates on live LLVM objects.
Exit Behavior
The audit can fail the build. If audit_report.has_errors() returns true after emitting to stderr, compilation aborts with an error message reporting the error count. Warnings and notes do not block compilation.
In the AOT path, the audit runs inside dbg_do! (zero cost in release builds). In the JIT path, the audit runs behind a raw audit_requested() check (the ori_llvm crate cannot depend on oric for the dbg_do! macro).
The codegen-audit.sh Diagnostic Script
The shell script diagnostics/codegen-audit.sh wraps the audit with a convenient CLI:
diagnostics/codegen-audit.sh [options] <file.ori>
| Flag | Effect |
|---|---|
--strict | Sets ORI_AUDIT_STRICT=1 |
--function <name> | Sets ORI_AUDIT_FUNCTION=<name> |
--color / --no-color | Force or disable colored output |
-h / --help | Show usage |
The script invokes ori build with ORI_AUDIT_CODEGEN=1, captures stderr, colorizes findings (errors in red, warnings in yellow, summary in bold), and determines the exit code from the summary line. It cleans up any compiled binary after the audit completes.
| Exit Code | Meaning |
|---|---|
0 | Clean — no findings |
1 | Findings detected |
2 | Usage error or compilation failure |
Prior Art
LLVM Verifier — LLVM’s built-in verifier checks structural IR invariants (type correctness, CFG well-formedness, PHI node consistency). Ori’s verification is complementary — it checks language-level invariants (RC lifecycle, COW sequencing) that LLVM has no knowledge of. Both run during compilation: LLVM’s verifier before optimization, Ori’s verifier before emission.
Miri (Rust) — Miri is an interpreter for Rust’s MIR that detects undefined behavior, including use-after-free, data races, and invalid pointer arithmetic. Miri operates at runtime (interpreting MIR), while Ori’s audit operates at compile time (analyzing LLVM IR). Miri catches more bugs (it tracks every memory operation) but requires test inputs that exercise the buggy code paths. Ori’s audit catches structural violations regardless of test coverage.
Swift SIL Verifier — Swift’s SIL verifier checks ARC-specific invariants: balanced retain/release pairs, correct ownership conventions, and valid borrow scopes. This is the closest prior art to Ori’s RC balance checker. Swift’s verifier operates on SIL (before LLVM IR generation), while Ori’s operates on LLVM IR (after generation). Swift’s approach catches errors earlier in the pipeline; Ori’s approach catches errors in the final output.
Valgrind — Valgrind’s Memcheck tool detects memory errors (leaks, use-after-free, uninitialized reads) by instrumenting the executable at runtime. Ori provides Valgrind integration through diagnostics/valgrind-aot.sh, but Valgrind requires actually running the program. The codegen audit provides complementary coverage: it catches structural RC violations without execution, while Valgrind catches dynamic memory errors that the static analysis misses.
CompCert — CompCert is a formally verified C compiler that proves semantic preservation from source to assembly. This is the gold standard for compiler verification but requires immense effort (CompCert’s verification took years of proof engineering). Ori’s codegen audit is far less rigorous but also far more practical — it took days to implement, not years, and catches the specific class of bugs (RC lifecycle violations) that are most dangerous for an ARC-based language.
Design Tradeoffs
Linear scan vs. full dataflow analysis. The audit uses a forward linear scan per function, not CFG-aware dataflow analysis. A dataflow approach would catch conditional RC paths (where only one branch decrements) but would be significantly more complex and slower. The linear scan handles 95%+ of real RC patterns and was implemented in days rather than weeks. The deliberate choice is: fast, simple, and catches most bugs, rather than thorough and catches all bugs.
False negatives by design. The audit accepts that it will miss some bugs (conditional RC paths, inter-procedural flows, aliased pointers) in exchange for never producing false positive errors in normal mode. Every Error finding in normal mode represents a real violation. This is the right trade-off for an audit that can fail the build — false positives would train developers to ignore the audit.
Opt-in vs. always-on. The audit is gated behind an environment variable rather than running on every build. Always-on would catch more bugs but would add compilation time and produce noise (notes and warnings on every build). The opt-in approach lets developers enable the audit when investigating suspected RC bugs, and keeps the normal compilation path fast and quiet.
Live IR vs. textual IR. Operating on inkwell IR objects rather than parsing textual LLVM IR makes the audit fast and immune to format changes, but couples it to inkwell’s API. If Ori ever switches LLVM bindings (to a different Rust wrapper or direct C API calls), the audit would need porting. The coupling is acceptable because inkwell is deeply embedded in the backend — any binding change would require rewriting far more than just the audit.
Separate check modules vs. unified analysis. The four checks are independent modules with no data dependencies. A unified analysis — tracking RC state, COW state, and ABI correctness in a single pass — would be faster (one walk instead of four) but would tangle the concerns. Independent modules are easier to maintain, test, and extend. The overhead of multiple passes is negligible relative to LLVM instruction emission.