0%

Section 12 — Verification + Autopilot Soak

Cites §01 north star (invariants 6, 7, 8, 16, 18); consumes every prior section. Final proof the rebuild embodies the thesis.

Goal

See frontmatter. This section validates that the rebuilt system does what the subject plan only described: scripts drive, LLM only codes, zero halts, one autonomous cross-plan system.

Intelligence Reconnaissance

  • scripts/intel-query.sh dag-ascii scripts-first-restructure --repo ori — §12 is the terminal node (predecessor §11C, no successor); strict-linear chain §09 → §09B → §10 → §11A → §11B → §11C → §12.
  • scripts/intel-query.sh plan-status scripts-first-restructure --repo ori — §12 is the sole open section; §11C complete; §11A/§11B superseded; §02-§10 complete.
  • scripts/intel-query.sh similar build_manifest --repo ori — the determinism-pin callers of the resume-manifest builder the §12.1 cross-plan determinism assertion grounds on.
  • [ori:scripts/plan_corpus/resume_manifest.py] build_manifest — zero-context resume manifest; the byte-identical-across-process determinism contract (the §12.1 cross-plan determinism pin).
  • [ori:scripts/plan_corpus/cross_section_check.py] detect_invariant_acceptance_hook_missing (:756) / TARGETED_DETECTORS string entry (:832, decl :831 — deliberately excluded from ALL_DETECTORS) / gate-wire guard (:892, call :893) — the detector + soak surfaces §12 grinds.
  • [ori:scripts/plan_orchestrator/route_walk.py] dynamic_grow drain — the §11C.3 boundary-contract next_action.exit_kind the soak exercises for ≥2 growth cycles.
  • [ori:plans/completed/scripts-first-workflow-architecture/00-overview.md] — the cross-plan dogfood subject, verified status: complete (54/54 terminal: 35 completed + 19 superseded) → §09.4 dogfood gate + depends_on carrier-(b) satisfied.
  • Summary (≤500 chars): §12 is the strict-linear terminal node (predecessor §11C). The capstone soak grinds the merged route for N fake-clock hours + ≥4h real soak (§12.2), asserts dynamic_grow ≥2-cycle then plan_terminus, INV-16/17/18 end-to-end, cross_plan_all_deferred resume, USER_DECISION/QUARANTINE → structured_exit. Determinism pinned on resume_manifest.build_manifest. Dogfood subject already 54/54 terminal. ISO 2026-06-02.

Cross-plan close-out dependency

This plan cannot reach close-out until plans/completed/scripts-first-workflow-architecture (the v7 dogfood subject) reaches status: complete, per 00-overview.md §Cross-plan dependency.

The §12 cross-plan blocker is enforced structurally (depends_on edges + a close-gate), not by a frontmatter field:

  • scripts-first-restructure is a legacy-markdown plan (route_v7.detect_route_modemarkdown; no plan.json).
  • Per the 09B.4 carrier-select rule (decisions/05-three-plan-types.md INV-22), the edge carrier is the plan’s TYPE; for a legacy-markdown plan that carrier is (b) the up-chain mechanical gate — NOT a v7 cross_plan_needs schema field.
  • NOT gated on §10 converting this plan to v7 (§10 was re-scoped by Decision-05: no corpus-wide JSON conversion).

Carrier (b) — the mechanical edge that carries the blocker structurally:

  • §09.4 dogfood gate (in §09, status: complete) HARD-BLOCKS §09 close until the subject plan reaches status: complete — verified met (subject 54/54 terminal: 35 completed + 19 superseded).
  • depends_on chain: §10 depends_on §09B depends_on §09; §12 reaches §09 transitively through the strict-linear chain (§09 → §09B → §10 → §11A → §11B → §11C → §12). So nothing past §09 — including this §12 — can run until the subject plan finishes.

The two together (dogfood-gate + depends_on chain) are the structural carrier: the blocker is enforced by depends_on edges + a close-gate that holds the chain, not by a §12 frontmatter field. The schema-field carrier (a) applies only AFTER an explicit/opt-in convert_plan_dir_v7 of this plan, which §10 does NOT mandate; while the plan stays legacy-markdown, carrier (b) is the permanent structural edge.

The subject plan’s terminal review-findings section records the handback back here. §12.3’s autonomous-corpus soak then includes the now-complete subject plan as proof the json-schema system grinds real work to done with zero halts.

12.1 — Aggregate acceptance + cross-plan determinism

  • Aggregate the §02-§11C per-node acceptance suites into one blocking suite registered in compiler_repo/test-all.sh (the §12 work authors the suite-registration entry in compiler_repo/test-all.sh plus an aggregate runner under scripts/plan_corpus/tests/test_aggregate_acceptance.py that imports each per-section suite and fails the whole suite if any per-node test is red).
  • Pin cross-plan determinism in scripts/plan_corpus/tests/test_resume_manifest_determinism.py: the (key, plan_id, node_id) merge is byte-identical across in-process / CLI-subprocess / fresh-process (the /clear-equivalent matrix), and resume_manifest.py:build_manifest reconstructs an identical resume pointer from disk alone (zero-context resume, invariant 6) per state-discipline.md §6.5.
  • Add canonical exit_reason routing-parity to the aggregate gate: register python -m scripts.plan_orchestrator.exit_reasons_check and scripts/plan_corpus/tests/test_next_action_mapping_completeness.py in the aggregate suite + compiler_repo/test-all.sh, so the aggregate gate fails if any next_action exit_reason is unmapped or any canonical exit_reason lacks a next_action route (parity with scripts/plan_corpus/exit_reasons.py CANONICAL_EXIT_REASONS, defined at scripts/plan_corpus/exit_reasons.py:42).
  • Aggregate-soak the §06.4-authored invariant→section cross-check detector (detect_invariant_acceptance_hook_missing / STRUCTURE:invariant-acceptance-hook-missing, defined at scripts/plan_corpus/cross_section_check.py:756, registered in TARGETED_DETECTORS at :832 and gate-wired at :892, with --self-test positive case 16 at :1319 + negative case at :1387; authored + §06.4-gate-wired in §06.4 per section-06-engine.md:254-255 (detector authoring-vs-consuming boundary); §06 precedes §11C (this section’s depends_on: ["11C"] predecessor) in plan order, so the §06.4-authored detector + completion-gate wiring exist before §12 runs — a semantic dependency on §06’s deliverable carried transitively through the strict-linear chain, NOT a direct §12 depends_on edge): run it over the permanent mixed corpus during the §12.2 soak and assert zero STRUCTURE:invariant-acceptance-hook-missing findings at steady state. §12 does NOT author the detector or wire the gate — it aggregates and soaks the already-wired detector.
  • Subsection close (12.1) — all [x]; status: complete.

12.2 — Autopilot soak (fake-clock deterministic + real >=4h)

Owns the capstone soak deliverable (success_criterion “Autopilot soak”): the fake-clock deterministic soak, the real >=4h soak evidence artifact, the §11C→§12 dynamic_grow boundary contract, and the INV-17 / INV-18 end-to-end + cross-plan-deferred + halt-class soak assertions. The §12.1 invariant-coverage detector run feeds this soak; §12.3 then exercises the live /continue-roadmap --autopilot end-to-end run.

Permanent mixed-corpus fixture (definition — pinned) — the “permanent mixed corpus” the fake-clock soak grinds is a SYNTHETIC, git-tracked fixture authored under scripts/plan_orchestrator/tests/fixtures/mixed_corpus/, NOT live disk state and NOT a frozen snapshot of the real plans/ tree. Pinning it removes the reproducibility hole (the soak cannot trivially pass against an empty or all-converted corpus).

  • Author the fixture root scripts/plan_orchestrator/tests/fixtures/mixed_corpus/ containing exactly one plan dir of EACH of the three plan types per decisions/05-three-plan-types.md, so every route_v7.detect_route_mode branch and the serve_next_with_cross_plan merge are exercised on every soak run:
    • json_static/ — a v7 plan.json (schema: scripts-first-restructure/plan-routing/v7) with a fixed sections[] + work_items[] set, detect_route_modev7, no dynamic flag; supplies the steady-state served nodes.
    • json_dynamic/ — a v7 plan.json with dynamic: true and a growth_complete: false start, detect_route_modev7; supplies the dynamic_grow ≥2-cycle-then-plan_terminus boundary exercise (consumed by test_dynamic_grow_two_cycle_then_terminus).
    • legacy_markdown/ — a section-*.md-only plan dir with NO plan.json, detect_route_modemarkdown; supplies the retained-markdown-walk path coverage and the cross-plan blocker carrier (b) case.
    • At least one json_staticjson_dynamic cross-plan edge (cross_plan_needs: [{plan, id}]) so serve_next_with_cross_plan hard-defers and resumes during the soak (consumed by test_cross_plan_all_deferred_resumes).
  • Non-emptiness pin: assert at soak setup that the fixture corpus contains >=1 plan of each of the three types and >=1 cross-plan edge — test_autopilot_soak_fakeclock.py::test_mixed_corpus_fixture_shape fails if any type bucket is empty or no cross-plan edge is present, so the soak can never green against a degenerate/empty corpus.

Soak test-design (fake-clock harness — pinned) — the fake-clock soak MUST be a deterministic unit harness, NOT a wall-clock sleep loop, so the assertions are stub-resistant and re-runnable:

  • Fake-clock mechanism: inject a monotonic FakeClock (a test double advancing simulated time only when the harness calls clock.advance(delta)); the engine reads simulated time exclusively through the injected clock — no time.sleep, no time.time() in the soak path. The harness advances the clock one simulated step per served next_action until N=24 simulated hours elapse OR every fixture plan reaches plan_terminus, whichever comes first.

  • Mock plan fixtures: drive the harness against the pinned scripts/plan_orchestrator/tests/fixtures/mixed_corpus/ corpus above (json_static + json_dynamic + legacy_markdown + the cross-plan edge); the harness copies the fixture into a tmp_path so each run mutates an isolated corpus and the git-tracked fixture stays pristine.

  • Edge case — dynamic plan that never reaches growth_complete: include a json_dynamic_never_complete/ fixture variant (or a harness flag) whose grow-branch keeps emitting dynamic_grow; assert the soak hits the simulated-hours cap WITHOUT a CURABLE halt and WITHOUT a false plan_terminus, and that the engine keeps advancing OTHER plans’ nodes meanwhile (the never-completing plan never starves the corpus). Pin in test_autopilot_soak_fakeclock.py::test_dynamic_never_complete_does_not_starve.

  • Edge case — multiple simultaneous cross-plan deferrals: configure >=2 distinct candidates whose cross_plan_needs targets are all not-completed at the same served step; assert serve_next_with_cross_plan defers all of them, advances each gating target, and resumes every deferred candidate to done with zero halts and no no_cross_plan_ready_node terminus. Pin in test_autopilot_soak_fakeclock.py::test_multiple_simultaneous_cross_plan_deferrals.

  • Fake-clock deterministic soak: author scripts/plan_orchestrator/tests/test_autopilot_soak_fakeclock.py driving the merged-route engine over an injected FakeClock (per the test-design above) for N=24 simulated hours (N>=1; 24 chosen to exceed the real >=4h floor with margin), asserting the engine grinds the permanent mixed-corpus fixture to steady state with ZERO CURABLE-class halts (every rules.HaltReason in the CURABLE partition auto-cured via cure_dispatch.py, never surfaced). Drive against a tmp_path copy of the pinned fixture so the soak is isolated and re-runnable.

  • Edge-case pins authored: test_autopilot_soak_fakeclock.py::test_dynamic_never_complete_does_not_starve (dynamic plan never reaching growth_complete caps without false terminus and does not starve other plans) + test_autopilot_soak_fakeclock.py::test_multiple_simultaneous_cross_plan_deferrals (>=2 concurrent cross-plan deferrals all resume to done).

  • Real >=4h soak evidence artifact: record a real-wall-clock /continue-roadmap --autopilot soak of >=4h to plans/scripts-first-restructure/references/soak-evidence.md (start/end SHAs, halt-reason tally, nodes advanced, cross-plan jumps observed); the soak node (§06.5) emits a genuine halt ONLY on true non-convergence.

  • §11C.3 dynamic_grow >=2-consecutive-growth-cycle assertion (the §11C→§12 boundary contract): the fake-clock soak names dynamic_grow as an exercised next_action.exit_kind and asserts the json-dynamic plan in the mixed corpus emits dynamic_grow for >=2 consecutive growth cycles with zero halts — each cycle the engine consumes the §11C.3 drain-branch cure (post_dispatch_resume + autopilot_proceed_directive), authors the next section/work_items, and resumes; once growth stops the plan emits plan_terminus (NOT another dynamic_grow) and moves to plans/completed/. Pin in test_autopilot_soak_fakeclock.py::test_dynamic_grow_two_cycle_then_terminus.

  • INV-17 Completion-Integrity end-to-end soak assertions (per decisions/01-completion-integrity-invariants.md; pin in scripts/plan_orchestrator/tests/test_soak_inv17.py): the soak asserts (a) a node whose declared deliverables are absent is QUARANTINED back to not-done, never auto-completed and never halts; (b) the schema rejects a done node lacking a valid attestation; (c) a completion-claim discrepancy is never auto-cured / hash-refreshed by cure_dispatch; (d) an expired/archived-owner route-access bypass is refused.

  • INV-18 Content-as-context-unit end-to-end soak assertions (per decisions/02-content-as-context-unit.md; pin in scripts/plan_orchestrator/tests/test_soak_inv18.py): the soak asserts (a) the next_action task injected during the run is the WHOLE body_ref content file, never a fragment / section_info slice; (b) every content/<id>--*.md stays under the §03 content-size cap; (c) an over-cap content file is SPLIT into sibling flat nodes (engine-minted ids+keys), never compacted, and does not reach done while over-cap.

  • cross_plan_all_deferred zero-halt assertion (pin in test_autopilot_soak_fakeclock.py::test_cross_plan_all_deferred_resumes): when every cross-plan candidate is hard-deferred by serve_next_with_cross_plan (all cross_plan_needs targets not completed), the soak asserts the engine retries / resumes — advances the gating target in another plan, then returns to complete the deferred candidate — and never emits a false no_cross_plan_ready_node terminus.

  • USER_DECISION + QUARANTINE halt-class resolution assertion (pin in test_autopilot_soak_fakeclock.py::test_user_decision_and_quarantine_halt_classes): the soak grinds the USER_DECISION + QUARANTINE halt-reason partitions (from §06.1 cure_dispatch.py partition sets, proven exhaustive by BOTH test_halt_reason_partition_exhaustive (enum rules.HaltReason members) AND test_halt_string_partition_exhaustive (non-enum halt STRINGS) per §06.1) to structured_exit / quarantine resolution with ZERO autopilot pause-leaks (no AskUserQuestion, no STRUCTURE:autopilot-pause-leak shape per skill-control-contract.md §Autopilot Mode). The soak corpus includes at least one string-keyed USER_DECISION route (a non-enum halt string routed through the USER_DECISION partition) so the string-partition path is exercised end-to-end, not only the enum path; assert it resolves to structured_exit with no pause-leak.

  • SoakRunSummary evidence gate: the fake-clock soak emits a structured SoakRunSummary JSON ({simulated_hours, served_node_count, halts_by_class: {CURABLE, GENUINE, USER_DECISION, QUARANTINE}, dispatches_by_skill, growth_cycles, cross_plan_jumps, terminus_reason}) written to scripts/plan_orchestrator/tests/fixtures/mixed_corpus/soak-run-summary.json; a validator test_autopilot_soak_fakeclock.py::test_soak_run_summary_shape asserts the schema (every halts_by_class.CURABLE count == 0, growth_cycles >= 2, cross_plan_jumps >= 1) so the soak’s pass is machine-checked, not prose-asserted. The real >=4h soak gates its §12.N completion checkbox on a scripts/plan_corpus/write.py:check_item --evidence file:plans/scripts-first-restructure/references/soak-evidence.md:<line-range> evidence pointer (per write.py:parse_evidence_pointer _EVIDENCE_KINDS), so the evidence artifact is required by the check-item API, never hand-flipped.

  • Bugs-as-nodes route assertion (pin in test_autopilot_soak_fakeclock.py::test_add_bug_fix_bug_fire_as_route_nodes): the mixed-corpus fixture includes a bug-route/ case (a node whose served next_action dispatches /add-bug then /fix-bug as route nodes, per the §06 bugs-as-nodes capability); a soak run-log assertion fails unless the run log records at least one /add-bug dispatch followed by its /fix-bug dispatch resolved as route nodes (not as a halt), proving the load-bearing bugs-as-nodes engine claim fires end-to-end during the soak.

  • NEEDS MANUAL ATTENTION escape assertion (pin in test_autopilot_soak_fakeclock.py::test_mixed_corpus_never_needs_manual_attention): across the full soak run the engine NEVER reaches the autopilot verdict-ladder escape VerdictClass.NEEDS_MANUAL_ATTENTION (scripts/plan_orchestrator/verdict_schema.py:42); the soak asserts every review-as-phase verdict resolves to a non-escape class and that the SoakRunSummary carries zero NEEDS MANUAL ATTENTION verdicts — reaching it would contradict the zero-halt mission (INV-6).

  • Programmatic cross-plan blocker resolution assertion (pin in scripts/plan_corpus/tests/test_cross_plan_defer.py::test_blocked_by_resolves_in_dependency_order, aggregated by §12.1’s test_aggregate_acceptance.py): a plan blocked_by another plan’s node (a cross_plan_needs: [{plan, id}] edge whose target is not completed) is mechanically asserted to resolve in dependency order — serve_next_with_cross_plan advances the gating target first, then the deferred candidate reaches done — so cross-plan blocker resolution is proven by the suite, not prose-asserted.

  • Subsection close (12.2) — all [x]; status: complete.

12.3 — End-to-end autonomous cross-plan run

  • /continue-roadmap --autopilot on the PERMANENT MIXED corpus (exercising all three plan types per decisions/05-three-plan-types.md: json-static + json-dynamic + legacy-markdown — NOT a uniformly-converted corpus): auto-dispatches /add-bug + /fix-bug as route nodes, jumps to another plan’s node and returns, runs review inline as a route phase, commits path-scoped, never pauses.
  • §11C→§12 boundary contract on the LIVE path (per section-11C-dynamic-plans.md §11C→§12 boundary-contract success_criterion): the live /continue-roadmap --autopilot run consumes the §11C.3 drain-branch cure (post_dispatch_resume + autopilot_proceed_directive) end-to-end — the json-dynamic plan in the mixed corpus grows on drain, resumes, and (once growth stops) emits plan_terminus and moves to plans/completed/ via the live engine, with zero CURABLE-class halt in the run log. The dynamic_grow ≥2-cycle-then-plan_terminus ASSERTION itself is owned by §12.2 (test_autopilot_soak_fakeclock.py::test_dynamic_grow_two_cycle_then_terminus, deterministic harness); this item verifies the same boundary contract holds on the LIVE autopilot path, not the fake clock.
  • During the run, the §08 hook blocks every LLM plan.json Read/Edit/Write (verify from the hook log); the LLM only authored code/content.
  • Verify §01.2 DROP-default + MMM active: no uncited philosophy carried; every new mechanism this plan added cites its essential-complexity domain.
  • Verify INV-16: the §06.4 review-as-phase node records a /improve-tooling-history consultation (§4 Lessons + §6 Improvement Log + list-locked-designs for touched paths) before any section reaches reviewed/done; acceptance test asserts a sign-off attempt with no recorded consultation is rejected and the node stays not-done.
  • Verify INV-18 (per decisions/02-content-as-context-unit.md): the run’s injected next_action task is the whole body_ref file (hook/log shows no section_info fragment); every content/<id>--*.md is under the §03 cap; an injected over-cap content file is split into sibling flat nodes (never compacted) and does not reach done while over-cap.
  • Subsection close (12.3) — all [x]; status: complete.

12.R Third Party Review Findings

  • None.

12.N Completion Checklist

  • 12.1-12.3 [x] and status: complete.
  • Aggregate acceptance suite + soak green in compiler_repo/test-all.sh; real-soak evidence artifact recorded.
  • python -m scripts.plan_orchestrator.exit_reasons_check exit 0 + scripts/plan_corpus/tests/test_next_action_mapping_completeness.py green (exit_reason ↔ next_action routing parity).
  • Mixed-corpus fixture shape pinned: test_mixed_corpus_fixture_shape green (>=1 plan of each of the three types + >=1 cross-plan edge); soak cannot green against an empty/degenerate corpus.
  • cross_section_check.py --self-test green incl. the new invariant-acceptance-hook-missing positive+negative cases; detector run over the permanent mixed corpus reports zero findings at steady state.
  • End-to-end autonomous run verified; §08 hook log shows zero LLM plan.json access.
  • python -m scripts.plan_corpus check plans/scripts-first-restructure/section-12-*.md exit 0.
  • /tpr-review passed (final, full-section); plan-wide /review-plan clean.

References

  • §02-§11C acceptance suites; §06 engine + soak node; §08 hook log; §05 comparator.
  • §01 invariants INV-1 (scripts drive — proven by soak), INV-8 (one big system; autonomous cross-plan), INV-9 (path-scoped commits — §07 acceptance run here), INV-16 (improve-tooling-history-grounded review — §12.3 acceptance) — IDs per §01.1 invariant→section table.