s17 — Compile-Speed Baseline + Fast-Tier Throughput Gates
Goal
Compile-speed is measured, not asserted: a per-phase backend benchmark harness, a fair Cranelift baseline harness, and CI-gated fast-tier throughput ratchets aiming at the mission target (beat Cranelift), with every comparison /calc-computed and methodology-documented.
Implementation Sketch
- Backend phase telemetry: per-phase timing (translate/isel/regalloc/encode/object-write) via tracing targets (ORI_LOG=ori_backend=debug) + a criterion bench family in oric/benches/ (backend_lower.rs etc.) following the existing 8-bench precedent.
- Corpus: compile-throughput corpus spanning size classes (bench_hello/small/medium + generated large modules + rosetta program set) — measured as IR-in to object-out (backend-scope) AND end-to-end (user-scope), reported separately.
- Cranelift baseline harness: a standalone Rust harness driving cranelift-codegen over EQUIVALENT input (CLIF translated from the same workload shapes); methodology doc defines fairness (same opt level = none, same target, warm process, median-of-N; functions/sec and MB/s metrics); all arithmetic through /calc; results in a checked-in baseline.json-style artifact with regeneration script.
- Ratchets: per-phase budgets derived from the baseline gap; CI regression gate (>10% regression fails, cow-benchmark.sh precedent); the BEAT-Cranelift line is tracked per size-class in the fast-tier speed-standings artifact (assembled into the s23 promotion ledger).
- Optimization work this section: only structural compile-speed fixes surfaced by telemetry (allocation churn, re-walks) — the architecture (SoA, recycling, single-pass) already encodes the speed design from s01/s03.
Test Strategy
- Bench determinism: benches run sequentially (CPU contention skews — existing benchmark discipline); regression gate self-test with an injected slowdown fixture.
- Semantic pin: the baseline artifact carries corpus SHAs + methodology version; the gate fails on artifact/corpus drift (stale-baseline negative pin).
Work Items
- Per-phase telemetry (tracing targets + criterion bench family) over the throughput corpus.
- Cranelift baseline harness + fairness methodology doc + checked-in baseline artifact with regeneration script (/calc-verified numbers).
- CI compile-speed regression gate (cow-benchmark precedent) + injected-slowdown self-test + stale-baseline pin.
- Structural speed fixes surfaced by telemetry landed; per-size-class fast-vs-Cranelift standings recorded in the fast-tier speed-standings artifact (s23 promotion-ledger input).