Section 03: Orchestrator Skill
Status: Not Started Goal: Rewrite SKILL.md to compose real diagnostic tools with AI IR analysis, replacing the broken Python scoring pipeline. The orchestrator preserves the proven main agent / background agent separation but changes what the background agent does.
Success Criteria:
-
/code-journey 01-arithmetic.oriproduces a valid JSON result file - Fast mode (diagnostic tools only) runs in < 30s per journey
- Deep mode (tools + AI) runs in < 5 min per journey
- Tool failures block AI analysis (not the reverse)
- All modes work: single file, —add N, —summary, —infinity, run-all
- Satisfies mission criteria for JSON + markdown output
Context: The main agent’s Steps 0-2 (build run list, run eval/AOT, capture phase dumps) are proven infrastructure — reuse them exactly. Step 3 (spawn background agent) stays architecturally but the background agent’s instructions completely change. Step 4 (status/mode logic) is reusable. Step 5 (overview generation) is rewritten to produce from JSON.
Reference implementations:
- Current SKILL.md Steps 0-2: Phase capture with env vars and temp files
- diagnose-aot.sh
diagnostics/diagnose-aot.sh: Multi-tool composition pattern (7 sections, sequential execution, status tracking)
Depends on: Section 02 (JSON schema — must know what to produce) and Section 04 (tool JSON output — can use structured output if available).
03.1 Main Agent Orchestration
File(s): .claude/skills/code-journey/SKILL.md
The main agent’s responsibilities are unchanged from the current system. Rebuild the SKILL.md skeleton (from Section 01.2) into the complete new orchestrator.
-
Preserve Step 0: Build the Run List — scan
plans/code-journeys/*.ori, parse args, determine mode -
Preserve Step 1: Run Both Paths — eval (
cargo run -- run file.ori), AOT (ori build file.ori+ execute binary), capture exit codes + stdout to/tmp/journey_N/ -
Preserve Step 2: Capture Phase Dumps — all env vars and temp files:
ORI_LOG=ori_lexer=debug→lexer.txtORI_LOG=ori_parse=debug→parser.txtORI_LOG=ori_types=debug→typeck.txtORI_DUMP_AFTER_ARC=1→arc_ir.txtORI_DUMP_AFTER_LLVM=1→llvm_ir.txtsize -A binary→sections.txtobjdump -d binary→disasm.txt
-
Add Step 2.5: Capture provenance metadata to
/tmp/journey_N/provenance.json:{ "commit": "$(git rev-parse HEAD)", "dirty": $(git diff --quiet && echo false || echo true), "timestamp": "$(date -Iseconds)" } -
Rewrite Step 3: Spawn background agent with new prompt template (see 03.2/03.3)
- Pass: journey metadata, source file path, eval/AOT results, temp dir path, analysis mode (fast/deep)
- Default mode:
deepwhen running single journey or run-all;fastwhen running--addbatch
-
Preserve Step 4: Mode-dependent continue/stop logic
-
Rewrite Step 5: Overview generation reads JSON results, not YAML frontmatter (deferred to Section 06)
-
Add
--fastflag: force fast mode on any invocation -
Add
--deepflag: force deep mode on any invocation (default for single journey) -
Subsection close-out (03.1) — MANDATORY before starting 03.2:
- All tasks above are
[x]and SKILL.md main agent flow verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection - Run
/sync-claudeon THIS subsection — check whether code changes invalidated any CLAUDE.md,.claude/rules/*.md, orcanon.mdclaims. If no API/command/phase changes, document briefly. Fix any drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
- All tasks above are
03.2 Background Agent — Diagnostic Tools (Fast Mode)
File(s): .claude/skills/code-journey/SKILL.md (background agent prompt template)
The background agent receives the temp directory with all phase dumps and runs diagnostic tools against the journey program. This is the “fast mode” — deterministic, < 30 seconds.
-
Define the background agent prompt template for fast mode. The agent:
- Reads journey metadata (number, slug, theme, difficulty, features, expected result)
- Reads eval/AOT exit codes and stdout from temp files
- Runs
ORI_CHECK_LEAKS=1 /tmp/journey_N/binary→ captures live count + status - Runs
diagnostics/rc-stats.sh file.ori→ captures per-function balance (use--jsonif available from Section 04, otherwise parse text output) - Runs
diagnostics/codegen-audit.sh file.ori→ captures findings (use--jsonif available, otherwise parse text) - Runs
timeout 150 diagnostics/dual-exec-verify.sh --test-only --json file.ori→ if applicable - Reads
llvm_ir.txtmetadata: line count, function count (grep for^define) - Reads
sections.txtfor binary size data - Reads
provenance.json - Assembles the JSON result per Section 02 schema (findings from tools only, ai_gate = “skip”)
- Writes
plans/code-journeys/NN-slug-results.json - Generates
plans/code-journeys/NN-slug-results.mdfrom JSON (template in Section 06) - Cleans up
/tmp/journey_N/
-
Tool failure handling:
- If AOT compilation fails: execution.aot.status = “compile_fail”, skip leak_check + rc_stats + codegen_audit, set diagnostic statuses to “skip”
- If eval fails but AOT succeeds (or vice versa): record parity mismatch as a finding
- If any diagnostic tool crashes: record as finding with category “tool_failure”, continue other tools
-
Test fast mode: run on J01 (arithmetic — simplest journey), verify JSON output matches schema
-
Subsection close-out (03.2) — MANDATORY before starting 03.3:
- All tasks above are
[x]and fast mode produces valid JSON - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection - Run
/sync-claudeon THIS subsection — check whether code changes invalidated any CLAUDE.md,.claude/rules/*.md, orcanon.mdclaims. If no API/command/phase changes, document briefly. Fix any drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
- All tasks above are
03.3 Background Agent — AI IR Analysis (Deep Mode)
File(s): .claude/skills/code-journey/SKILL.md (background agent prompt template, deep mode extension)
Deep mode runs after fast mode’s diagnostic tools complete successfully. If tools found blocking issues (compilation failure, parity mismatch), AI analysis is skipped (tool_gate = “fail”, ai_gate = “skip”). Otherwise, the agent reads the LLVM IR and performs structured analysis.
-
Define the AI analysis prompt structure (hybrid: structured passes + open-ended):
Structured passes (checklist — agent MUST check each):
- RC lifecycle: For each function, verify inc/dec balance matches rc-stats output. Look for paths where inc occurs but dec is conditional (branch-dependent leaks).
- Closure/env ownership: Check that closure environment allocation has matching deallocation. Look for captured fat pointers (str, [T]) whose RC is not properly managed.
- Iterator/drop semantics: Verify that
ori_iter_dropis called on every path (including break, continue, early exit). Check elem_dec_fn invocation. - ABI/attributes: Verify nounwind, noalias, readonly, dereferenceable are present where applicable. Flag missing attributes.
- Control flow: Check for empty blocks, redundant branches (both targets identical), trivial phi nodes, unreachable code after noreturn calls.
- Aggregate materialization: Look for field-by-field GEP+load+insertvalue chains that should be aggregate loads.
- Landing pads: Verify invoke+landingpad is only used for functions that CAN throw. Nounwind functions should use call, not invoke.
- Unwind cleanup: Check that landingpad blocks properly decrement RC for all live values. Missing cleanup = panic-path leak.
Open-ended anomaly pass:
- “What would a careful LLVM codegen reviewer flag that the structured passes missed?”
- Every finding MUST cite exact IR lines as evidence
- Cross-reference with diagnostic tool results (don’t re-report what tools already found)
-
Seed the analysis with historical bug classes from overview.md:
- Option struct wrapping (+7 unjustified instructions per iteration)
- [str] double-free in iterator + list cleanup
- Closure capturing str → unresolved type variable Idx leak
- Aggregate field-by-field materialization (9:1 reduction opportunity)
- Landing pad over-generation on nounwind functions
- SSO guard ptrtoint duplication
-
Finding output format: each finding produces a JSON object matching the schema’s findings array item (category, severity, location, title, evidence, source: “ai:ir_analysis”)
-
Test deep mode: run on J15 (nested fat — the journey with known LOWs), verify AI analysis finds the dead loop load that tool-only mode might miss
-
Subsection close-out (03.3) — MANDATORY before starting 03.4:
- All tasks above are
[x]and deep mode produces valid JSON with AI findings - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection - Run
/sync-claudeon THIS subsection — check whether code changes invalidated any CLAUDE.md,.claude/rules/*.md, orcanon.mdclaims. If no API/command/phase changes, document briefly. Fix any drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
- All tasks above are
03.4 JSON Production & Markdown Generation
File(s): .claude/skills/code-journey/SKILL.md
-
Implement the JSON assembly logic in the background agent prompt:
- Merge tool results + AI findings into the schema structure
- Compute finding ID fingerprints (SHA-256 of category+location+title, truncated to 12 hex)
- Set verdict: clean (0 findings), findings (has findings), blocked (tool_gate fail)
- Write to
plans/code-journeys/NN-slug-results.json
-
Implement markdown generation (simple template — full implementation in Section 06):
- Read JSON, produce markdown with sections: Source, Execution, Diagnostics, Pipeline Summary, Findings, Verdict
- Write to
plans/code-journeys/NN-slug-results.md
-
Auto-filing via
/add-bug: when findings have severity >= medium AND source is tool-based (not AI note-level):- Invoke
/add-bugwith: subsystem derived from journey features, severity from finding, repro = journey file path, source = “code-journey” - Record filed bug ID in finding’s
filed_asfield
- Invoke
-
Subsection close-out (03.4) — MANDATORY before marking section complete:
- All tasks above are
[x]and end-to-end flow verified - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection - Run
/sync-claudeon THIS subsection — check whether code changes invalidated any CLAUDE.md,.claude/rules/*.md, orcanon.mdclaims. If no API/command/phase changes, document briefly. Fix any drift NOW. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files.
- All tasks above are
03.R Third Party Review Findings
- None.
03.N Completion Checklist
-
/code-journey 01-arithmetic.oriproduces valid JSON + markdown -
/code-journey 15-fat-nested-collections.oriin deep mode finds the known LOWs - Fast mode completes in < 30s per journey
- Deep mode completes in < 5 min per journey
- All modes work (single file, —add, —summary, —infinity, run-all)
-
--fastand--deepflags override default mode selection -
/add-bugintegration fires on medium+ findings -
timeout 150 ./test-all.shgreen -
/tpr-review— dual-source review of orchestrator -
/impl-hygiene-review— verify SSOT (JSON is sole canonical artifact), no LEAK (background agent owns all post-capture) -
/improve-toolingsection-close sweep