0%

s02 — Research Ratchet: LLVM + GCC Optimization Study

Goal

A ranked optimization-priority list for the optimizing tier (s19/s20), derived from what actually carries LLVM -O2/-O3 and where GCC goes further — filtered through what Ori’s AIMS facts make cheaper or unnecessary.

Implementation Sketch

Read actual source under ~/projects/reference_repos/lang_repos/llvm-project/ and gcc/ (graph-first). Dossiers in content/:

  • LLVM pass-pipeline anatomy: the default/ pipeline composition (llvm/lib/Passes/PassBuilderPipelines.cpp), which passes dominate measured wins (inlining, SROA, GVN, InstCombine, LICM, loop opts, SLP/loop vectorize), inter-pass ordering constraints.
  • LLVM heuristics mining: inline cost model, SROA limits, GVN scope, where heuristic constants encode decades of tuning — captured as TUNING QUESTIONS Ori must answer empirically on its own corpus (rosetta), not constants to copy.
  • LLVM test-intent mining protocol: how to read llvm/test/Transforms/* regression tests for INTENT (the invariant pinned) and port intent into BIR-pass tests — clean-room rule applies (intent, never text).
  • GCC study (study-only per s00 licensing doc): IPA framework (gcc/ipa-*.cc — inlining, CP, SRA, pure/const detection), VRP (gcc/tree-vrp.cc, ranger), vectorizer organization; capture which IPA ideas AIMS contracts already subsume (EffectSummary vs IPA pure/const; ReprPlan ranges vs VRP) and which are genuinely additive.
  • Output: optimization-priority list — ranked (pass family, expected payoff on Ori workloads, AIMS-fact leverage, verification burden); this list IS the s19 pass-roster input.

Constraints

  • GCC: read-only study, zero code reuse (GPLv3 + uniform zero-copy rule).
  • Every “pass X matters” claim cites either a pipeline-source position or a named external measurement; folklore claims are marked as such.

Work Items

  • LLVM pass-pipeline anatomy dossier (O2/O3 composition, ordering constraints, dominant passes) with file:line citations.
  • LLVM heuristics dossier: inline/SROA/GVN/LICM cost-model questions reframed as Ori empirical-tuning questions tied to the rosetta corpus.
  • Test-intent mining protocol doc + a worked example (one LLVM transform test family ported as intent into a planned BIR-pass test shape).
  • GCC dossier: IPA/VRP/vectorizer study with an AIMS-subsumption table (already-have / additive / not-applicable per idea).
  • Ranked optimization-priority list for the optimizing tier, consumed verbatim by s19 planning.