0%

s17 — Compile-Speed Baseline + Fast-Tier Throughput Gates

Goal

Compile-speed is measured, not asserted: a per-phase backend benchmark harness, a fair Cranelift baseline harness, and CI-gated fast-tier throughput ratchets aiming at the mission target (beat Cranelift), with every comparison /calc-computed and methodology-documented.

Implementation Sketch

  • Backend phase telemetry: per-phase timing (translate/isel/regalloc/encode/object-write) via tracing targets (ORI_LOG=ori_backend=debug) + a criterion bench family in oric/benches/ (backend_lower.rs etc.) following the existing 8-bench precedent.
  • Corpus: compile-throughput corpus spanning size classes (bench_hello/small/medium + generated large modules + rosetta program set) — measured as IR-in to object-out (backend-scope) AND end-to-end (user-scope), reported separately.
  • Cranelift baseline harness: a standalone Rust harness driving cranelift-codegen over EQUIVALENT input (CLIF translated from the same workload shapes); methodology doc defines fairness (same opt level = none, same target, warm process, median-of-N; functions/sec and MB/s metrics); all arithmetic through /calc; results in a checked-in baseline.json-style artifact with regeneration script.
  • Ratchets: per-phase budgets derived from the baseline gap; CI regression gate (>10% regression fails, cow-benchmark.sh precedent); the BEAT-Cranelift line is tracked per size-class in the fast-tier speed-standings artifact (assembled into the s23 promotion ledger).
  • Optimization work this section: only structural compile-speed fixes surfaced by telemetry (allocation churn, re-walks) — the architecture (SoA, recycling, single-pass) already encodes the speed design from s01/s03.

Test Strategy

  • Bench determinism: benches run sequentially (CPU contention skews — existing benchmark discipline); regression gate self-test with an injected slowdown fixture.
  • Semantic pin: the baseline artifact carries corpus SHAs + methodology version; the gate fails on artifact/corpus drift (stale-baseline negative pin).

Work Items

  • Per-phase telemetry (tracing targets + criterion bench family) over the throughput corpus.
  • Cranelift baseline harness + fairness methodology doc + checked-in baseline artifact with regeneration script (/calc-verified numbers).
  • CI compile-speed regression gate (cow-benchmark precedent) + injected-slowdown self-test + stale-baseline pin.
  • Structural speed fixes surfaced by telemetry landed; per-size-class fast-vs-Cranelift standings recorded in the fast-tier speed-standings artifact (s23 promotion-ledger input).