s19 — Optimizing Tier: Fact-Driven Mid-End

Goal

The optimizing tier exists without recreating LLVM’s compile-cost failure mode: a bounded, per-function rewrite framework over BIR consuming the s18 fact surface, with the first two optimization waves landed, per-pass translation-validation hooks, exact cache dependencies, and determinism/idempotency proven. It never changes AIMS-owned logical event identity, multiplicity, order, transfer edge, or cleanup obligation; any physical count-operation folding must be separately proved against the selected compiled plan and remain inside the deterministic work/IR-growth envelope.

Implementation Sketch

Framework per the s03/s01 decision: if aegraph-style — bounded acyclic e-graph over one function with elaboration; if rule-pipeline — bounded ordered passes with a shared rewrite driver. Either way, passes are pure functions of (BIR, FactTables, Config, OptimizationBudget, immutable dependencies); module-wide mutable analysis state is forbidden. Cross-function reads use immutable summaries/bodies and return dependency fingerprints for FunctionCodegenKey.
Deterministic bounds: every registered pass declares a work unit (node visits, rewrite attempts, candidate insertions, or another reviewable deterministic counter), an input-size-derived fuel limit, and an IR-node/growth limit. The driver enforces per-pass and per-function totals. No “until no change” loop, saturation search, queue, or recursive transform exists without a hard checked counter. Wall-clock time is telemetry, never a cutoff, so machine speed and worker count cannot change output.
Budget exhaustion: stop the offending transform, retain the last verifier-clean BIR, emit structured pass/budget/input-size telemetry, and continue correctly. It may reduce optimization quality but never semantic correctness, logical-event invariants, compiled-plan satisfaction, or cache validity. Adversarial growth fixtures prove bounded termination and bounded node count/RSS.
Ownership guardrail (mechanical, not aspirational): the s06 verifier runs AFTER EVERY PASS. It rejects any change to a frozen logical event’s stable identity, multiplicity, order, transfer edge, or cleanup obligation and any backend-local reanalysis of ownership policy. A target-owned physical optimizer may fuse or change selected actions only when it retains exact event traceability and produces a CompiledLayoutPlan satisfaction proof; unproved physical-action changes fail immediately (LEAK:aims-ad-hoc-emission prevention as an executable invariant).
Wave 1 (from the s02 priority list, fact-leveraged): redundancy elimination (GVN-class over BIR, alias-precise via uniqueness/borrow facts instead of conservative AA), instruction scheduling within blocks (latency-aware list scheduling), isel-quality rewrites (addressing-mode fusion, compare-branch fusion, strength reduction).
Wave 2: loop optimizations (LICM with effect-fact purity — may_allocate/may_throw facts replace LLVM’s conservative re-derivation; induction simplification), call-site optimizations using ReturnContract/ParamContract facts (argument-setup elision for borrowed_read_only, freshness-based store forwarding), BIR-level inlining decision record (whether inlining’s runtime gain repays compile cost and invalidation breadth). If adopted, inlining consumes a bounded immutable callee body/summary and records that exact dependency; it cannot force module-wide invalidation or a mutable global call graph.
Per-pass translation-validation hooks: each pass can emit a (before, after) pair into the s21 validation harness (sampling in CI); hooks land NOW with the framework.
Opt-level config: fast tier bypasses the mid-end entirely (O0-equivalent); optimizing tier runs only the promoted bounded roster. A pass joins the roster only with a measured runtime win, compile-cost/RSS readings, declared cache dependency breadth, and no s20 blocker. “Runtime win at any compile cost” is not an adoption rule.

Test Strategy

Per-pass matrix: pass x program-shape corpus — evaluator/VM/LLVM/native output parity + post-pass logical-event and compiled-plan-satisfaction checks + idempotency (run-twice equality) + determinism (run-N byte-equality) + fuel/IR-growth accounting + declared dependency-key coverage.
Semantic pins: per wave-1 pass one before/after pin (FileCheck-style MIR/BIR snapshot proving the optimization FIRES on its target shape); negative pin proving it does NOT fire when a fact forbids it (e.g. GVN blocked by may_share).
Perf evidence: the per-pass ledger records runtime effect, compiler wall/CPU time, peak live BIR nodes, budget use, RSS, and invalidation breadth. A pass with no measurable runtime win, an unexplained compile-cost regression, or an s20 blocker gets a justify-or-park decision.
Pathology pins: adversarial rewrite cycles, combinatorial expression shapes, huge basic blocks, and deep loop nests terminate inside the configured deterministic bound; a deliberately unbudgeted pass registration is rejected.

Work Items

Bounded per-function mid-end framework (per s03 decision) with pure-pass/dependency contract, deterministic OptimizationBudget + IR-growth enforcement, safe exhaustion, pass-ordering doc, opt-level config, and the post-every-pass logical-event/compiled-plan guardrail.
Per-pass translation-validation hooks wired into the (future s21) harness from day one.
Wave 1: fact-precise GVN-class redundancy + list scheduling + isel-quality rewrites with fire/blocked pins per pass.
Wave 2: effect-fact LICM + induction simplification + contract-fact call-site optimizations + inlining decision record.
Determinism/idempotency/bounded-termination proofs over normal and adversarial corpora + per-pass runtime/compile-cost/RSS/invalidation ledger (justify-or-park for no-win or cost-blocking passes).