Section 03: Orchestrator Skill

Status: Not Started Goal: Rewrite SKILL.md to compose real diagnostic tools with AI IR analysis, replacing the broken Python scoring pipeline. The orchestrator preserves the proven main agent / background agent separation but changes what the background agent does.

Success Criteria:

/code-journey 01-arithmetic.ori produces a valid JSON result file
Fast mode (diagnostic tools only) runs in < 30s per journey
Deep mode (tools + AI) runs in < 5 min per journey
Tool failures block AI analysis (not the reverse)
All modes work: single file, —add N, —summary, —infinity, run-all
Satisfies mission criteria for JSON + markdown output

Context: The main agent’s Steps 0-2 (build run list, run eval/AOT, capture phase dumps) are proven infrastructure — reuse them exactly. Step 3 (spawn background agent) stays architecturally but the background agent’s instructions completely change. Step 4 (status/mode logic) is reusable. Step 5 (overview generation) is rewritten to produce from JSON.

Reference implementations:

Current SKILL.md Steps 0-2: Phase capture with env vars and temp files
diagnose-aot.sh diagnostics/diagnose-aot.sh: Multi-tool composition pattern (7 sections, sequential execution, status tracking)

Depends on: Section 02 (JSON schema — must know what to produce) and Section 04 (tool JSON output — can use structured output if available).

03.1 Main Agent Orchestration

File(s): .claude/skills/code-journey/SKILL.md

The main agent’s responsibilities are unchanged from the current system. Rebuild the SKILL.md skeleton (from Section 01.2) into the complete new orchestrator.

Preserve Step 0: Build the Run List — scan plans/code-journeys/*.ori, parse args, determine mode
Preserve Step 1: Run Both Paths — eval (cargo run -- run file.ori), AOT (ori build file.ori + execute binary), capture exit codes + stdout to /tmp/journey_N/
Preserve Step 2: Capture Phase Dumps — all env vars and temp files:
- ORI_LOG=ori_lexer=debug → lexer.txt
- ORI_LOG=ori_parse=debug → parser.txt
- ORI_LOG=ori_types=debug → typeck.txt
- ORI_DUMP_AFTER_ARC=1 → arc_ir.txt
- ORI_DUMP_AFTER_LLVM=1 → llvm_ir.txt
- size -A binary → sections.txt
- objdump -d binary → disasm.txt

Add Step 2.5: Capture provenance metadata to /tmp/journey_N/provenance.json:

{ "commit": "$(git rev-parse HEAD)", "dirty": $(git diff --quiet && echo false || echo true), "timestamp": "$(date -Iseconds)" }

Rewrite Step 3: Spawn background agent with new prompt template (see 03.2/03.3)
- Pass: journey metadata, source file path, eval/AOT results, temp dir path, analysis mode (fast/deep)
- Default mode: deep when running single journey or run-all; fast when running --add batch
Preserve Step 4: Mode-dependent continue/stop logic
Rewrite Step 5: Overview generation reads JSON results, not YAML frontmatter (deferred to Section 06)
Add --fast flag: force fast mode on any invocation
Add --deep flag: force deep mode on any invocation (default for single journey)
Subsection close-out (03.1) — MANDATORY before starting 03.2:
- All tasks above are [x] and SKILL.md main agent flow verified
- Update this subsection’s status in section frontmatter to complete
- Run /improve-tooling retrospectively on THIS subsection
- Run /sync-claude on THIS subsection — check whether code changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no API/command/phase changes, document briefly. Fix any drift NOW.
- Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

03.2 Background Agent — Diagnostic Tools (Fast Mode)

File(s): .claude/skills/code-journey/SKILL.md (background agent prompt template)

The background agent receives the temp directory with all phase dumps and runs diagnostic tools against the journey program. This is the “fast mode” — deterministic, < 30 seconds.

Define the background agent prompt template for fast mode. The agent:
1. Reads journey metadata (number, slug, theme, difficulty, features, expected result)
2. Reads eval/AOT exit codes and stdout from temp files
3. Runs ORI_CHECK_LEAKS=1 /tmp/journey_N/binary → captures live count + status
4. Runs diagnostics/rc-stats.sh file.ori → captures per-function balance (use --json if available from Section 04, otherwise parse text output)
5. Runs diagnostics/codegen-audit.sh file.ori → captures findings (use --json if available, otherwise parse text)
6. Runs timeout 150 diagnostics/dual-exec-verify.sh --test-only --json file.ori → if applicable
7. Reads llvm_ir.txt metadata: line count, function count (grep for ^define)
8. Reads sections.txt for binary size data
9. Reads provenance.json
10. Assembles the JSON result per Section 02 schema (findings from tools only, ai_gate = “skip”)
11. Writes plans/code-journeys/NN-slug-results.json
12. Generates plans/code-journeys/NN-slug-results.md from JSON (template in Section 06)
13. Cleans up /tmp/journey_N/
Tool failure handling:
- If AOT compilation fails: execution.aot.status = “compile_fail”, skip leak_check + rc_stats + codegen_audit, set diagnostic statuses to “skip”
- If eval fails but AOT succeeds (or vice versa): record parity mismatch as a finding
- If any diagnostic tool crashes: record as finding with category “tool_failure”, continue other tools
Test fast mode: run on J01 (arithmetic — simplest journey), verify JSON output matches schema
Subsection close-out (03.2) — MANDATORY before starting 03.3:
- All tasks above are [x] and fast mode produces valid JSON
- Update this subsection’s status in section frontmatter to complete
- Run /improve-tooling retrospectively on THIS subsection
- Run /sync-claude on THIS subsection — check whether code changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no API/command/phase changes, document briefly. Fix any drift NOW.
- Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

03.3 Background Agent — AI IR Analysis (Deep Mode)

File(s): .claude/skills/code-journey/SKILL.md (background agent prompt template, deep mode extension)

Deep mode runs after fast mode’s diagnostic tools complete successfully. If tools found blocking issues (compilation failure, parity mismatch), AI analysis is skipped (tool_gate = “fail”, ai_gate = “skip”). Otherwise, the agent reads the LLVM IR and performs structured analysis.

03.4 JSON Production & Markdown Generation