Section 12 — Verification + Autopilot Soak
Cites §01 north star (invariants 6, 7, 8, 16, 18); consumes every prior section. Final proof the rebuild embodies the thesis.
Goal
See frontmatter. This section validates that the rebuilt system does what the subject plan only described: scripts drive, LLM only codes, zero halts, one autonomous cross-plan system.
Intelligence Reconnaissance
scripts/intel-query.sh dag-ascii scripts-first-restructure --repo ori— §12 is the terminal node (predecessor §11C, no successor); strict-linear chain §09 → §09B → §10 → §11A → §11B → §11C → §12.scripts/intel-query.sh plan-status scripts-first-restructure --repo ori— §12 is the sole open section; §11C complete; §11A/§11B superseded; §02-§10 complete.scripts/intel-query.sh similar build_manifest --repo ori— the determinism-pin callers of the resume-manifest builder the §12.1 cross-plan determinism assertion grounds on.[ori:scripts/plan_corpus/resume_manifest.py]build_manifest— zero-context resume manifest; the byte-identical-across-process determinism contract (the §12.1 cross-plan determinism pin).[ori:scripts/plan_corpus/cross_section_check.py]detect_invariant_acceptance_hook_missing(:756) /TARGETED_DETECTORSstring entry (:832, decl :831 — deliberately excluded fromALL_DETECTORS) / gate-wire guard (:892, call :893) — the detector + soak surfaces §12 grinds.[ori:scripts/plan_orchestrator/route_walk.py]dynamic_growdrain — the §11C.3 boundary-contractnext_action.exit_kindthe soak exercises for ≥2 growth cycles.[ori:plans/completed/scripts-first-workflow-architecture/00-overview.md]— the cross-plan dogfood subject, verifiedstatus: complete(54/54 terminal: 35 completed + 19 superseded) → §09.4 dogfood gate +depends_oncarrier-(b) satisfied.- Summary (≤500 chars): §12 is the strict-linear terminal node (predecessor §11C). The capstone soak grinds the merged route for N fake-clock hours + ≥4h real soak (§12.2), asserts
dynamic_grow≥2-cycle thenplan_terminus, INV-16/17/18 end-to-end, cross_plan_all_deferred resume, USER_DECISION/QUARANTINE → structured_exit. Determinism pinned onresume_manifest.build_manifest. Dogfood subject already 54/54 terminal. ISO 2026-06-02.
Cross-plan close-out dependency
This plan cannot reach close-out until plans/completed/scripts-first-workflow-architecture (the v7 dogfood subject) reaches status: complete, per 00-overview.md §Cross-plan dependency.
The §12 cross-plan blocker is enforced structurally (depends_on edges + a close-gate), not by a frontmatter field:
scripts-first-restructureis a legacy-markdown plan (route_v7.detect_route_mode→markdown; noplan.json).- Per the 09B.4 carrier-select rule (
decisions/05-three-plan-types.mdINV-22), the edge carrier is the plan’s TYPE; for a legacy-markdown plan that carrier is (b) the up-chain mechanical gate — NOT a v7cross_plan_needsschema field. - NOT gated on §10 converting this plan to v7 (§10 was re-scoped by Decision-05: no corpus-wide JSON conversion).
Carrier (b) — the mechanical edge that carries the blocker structurally:
- §09.4 dogfood gate (in §09,
status: complete) HARD-BLOCKS §09 close until the subject plan reachesstatus: complete— verified met (subject 54/54 terminal: 35 completed + 19 superseded). depends_onchain: §10depends_on§09Bdepends_on§09; §12 reaches §09 transitively through the strict-linear chain (§09 → §09B → §10 → §11A → §11B → §11C → §12). So nothing past §09 — including this §12 — can run until the subject plan finishes.
The two together (dogfood-gate + depends_on chain) are the structural carrier: the blocker is enforced by depends_on edges + a close-gate that holds the chain, not by a §12 frontmatter field. The schema-field carrier (a) applies only AFTER an explicit/opt-in convert_plan_dir_v7 of this plan, which §10 does NOT mandate; while the plan stays legacy-markdown, carrier (b) is the permanent structural edge.
The subject plan’s terminal review-findings section records the handback back here. §12.3’s autonomous-corpus soak then includes the now-complete subject plan as proof the json-schema system grinds real work to done with zero halts.
12.1 — Aggregate acceptance + cross-plan determinism
- Aggregate the §02-§11C per-node acceptance suites into one blocking suite registered in
compiler_repo/test-all.sh(the §12 work authors the suite-registration entry incompiler_repo/test-all.shplus an aggregate runner underscripts/plan_corpus/tests/test_aggregate_acceptance.pythat imports each per-section suite and fails the whole suite if any per-node test is red). - Pin cross-plan determinism in
scripts/plan_corpus/tests/test_resume_manifest_determinism.py: the(key, plan_id, node_id)merge is byte-identical across in-process / CLI-subprocess / fresh-process (the/clear-equivalent matrix), andresume_manifest.py:build_manifestreconstructs an identical resume pointer from disk alone (zero-context resume, invariant 6) perstate-discipline.md §6.5. - Add canonical exit_reason routing-parity to the aggregate gate: register
python -m scripts.plan_orchestrator.exit_reasons_checkandscripts/plan_corpus/tests/test_next_action_mapping_completeness.pyin the aggregate suite +compiler_repo/test-all.sh, so the aggregate gate fails if anynext_actionexit_reason is unmapped or any canonical exit_reason lacks anext_actionroute (parity withscripts/plan_corpus/exit_reasons.pyCANONICAL_EXIT_REASONS, defined atscripts/plan_corpus/exit_reasons.py:42). - Aggregate-soak the §06.4-authored invariant→section cross-check detector (
detect_invariant_acceptance_hook_missing/STRUCTURE:invariant-acceptance-hook-missing, defined atscripts/plan_corpus/cross_section_check.py:756, registered inTARGETED_DETECTORSat :832 and gate-wired at :892, with--self-testpositive case 16 at :1319 + negative case at :1387; authored + §06.4-gate-wired in §06.4 persection-06-engine.md:254-255(detector authoring-vs-consuming boundary); §06 precedes §11C (this section’sdepends_on: ["11C"]predecessor) in plan order, so the §06.4-authored detector + completion-gate wiring exist before §12 runs — a semantic dependency on §06’s deliverable carried transitively through the strict-linear chain, NOT a direct §12depends_onedge): run it over the permanent mixed corpus during the §12.2 soak and assert zeroSTRUCTURE:invariant-acceptance-hook-missingfindings at steady state. §12 does NOT author the detector or wire the gate — it aggregates and soaks the already-wired detector. - Subsection close (12.1) — all
[x];status: complete.
12.2 — Autopilot soak (fake-clock deterministic + real >=4h)
Owns the capstone soak deliverable (success_criterion “Autopilot soak”): the fake-clock deterministic soak, the real >=4h soak evidence artifact, the §11C→§12 dynamic_grow boundary contract, and the INV-17 / INV-18 end-to-end + cross-plan-deferred + halt-class soak assertions. The §12.1 invariant-coverage detector run feeds this soak; §12.3 then exercises the live /continue-roadmap --autopilot end-to-end run.
Permanent mixed-corpus fixture (definition — pinned) — the “permanent mixed corpus” the fake-clock soak grinds is a SYNTHETIC, git-tracked fixture authored under scripts/plan_orchestrator/tests/fixtures/mixed_corpus/, NOT live disk state and NOT a frozen snapshot of the real plans/ tree. Pinning it removes the reproducibility hole (the soak cannot trivially pass against an empty or all-converted corpus).
- Author the fixture root
scripts/plan_orchestrator/tests/fixtures/mixed_corpus/containing exactly one plan dir of EACH of the three plan types perdecisions/05-three-plan-types.md, so everyroute_v7.detect_route_modebranch and theserve_next_with_cross_planmerge are exercised on every soak run:json_static/— a v7plan.json(schema: scripts-first-restructure/plan-routing/v7) with a fixedsections[]+work_items[]set,detect_route_mode→v7, nodynamicflag; supplies the steady-state served nodes.json_dynamic/— a v7plan.jsonwithdynamic: trueand agrowth_complete: falsestart,detect_route_mode→v7; supplies thedynamic_grow≥2-cycle-then-plan_terminusboundary exercise (consumed bytest_dynamic_grow_two_cycle_then_terminus).legacy_markdown/— asection-*.md-only plan dir with NOplan.json,detect_route_mode→markdown; supplies the retained-markdown-walk path coverage and the cross-plan blocker carrier (b) case.- At least one
json_static↔json_dynamiccross-plan edge (cross_plan_needs: [{plan, id}]) soserve_next_with_cross_planhard-defers and resumes during the soak (consumed bytest_cross_plan_all_deferred_resumes).
- Non-emptiness pin: assert at soak setup that the fixture corpus contains >=1 plan of each of the three types and >=1 cross-plan edge —
test_autopilot_soak_fakeclock.py::test_mixed_corpus_fixture_shapefails if any type bucket is empty or no cross-plan edge is present, so the soak can never green against a degenerate/empty corpus.
Soak test-design (fake-clock harness — pinned) — the fake-clock soak MUST be a deterministic unit harness, NOT a wall-clock sleep loop, so the assertions are stub-resistant and re-runnable:
-
Fake-clock mechanism: inject a monotonic
FakeClock(a test double advancing simulated time only when the harness callsclock.advance(delta)); the engine reads simulated time exclusively through the injected clock — notime.sleep, notime.time()in the soak path. The harness advances the clock one simulated step per servednext_actionuntilN=24simulated hours elapse OR every fixture plan reachesplan_terminus, whichever comes first. -
Mock plan fixtures: drive the harness against the pinned
scripts/plan_orchestrator/tests/fixtures/mixed_corpus/corpus above (json_static + json_dynamic + legacy_markdown + the cross-plan edge); the harness copies the fixture into a tmp_path so each run mutates an isolated corpus and the git-tracked fixture stays pristine. -
Edge case — dynamic plan that never reaches
growth_complete: include ajson_dynamic_never_complete/fixture variant (or a harness flag) whose grow-branch keeps emittingdynamic_grow; assert the soak hits the simulated-hours cap WITHOUT a CURABLE halt and WITHOUT a falseplan_terminus, and that the engine keeps advancing OTHER plans’ nodes meanwhile (the never-completing plan never starves the corpus). Pin intest_autopilot_soak_fakeclock.py::test_dynamic_never_complete_does_not_starve. -
Edge case — multiple simultaneous cross-plan deferrals: configure >=2 distinct candidates whose
cross_plan_needstargets are all not-completedat the same served step; assertserve_next_with_cross_plandefers all of them, advances each gating target, and resumes every deferred candidate todonewith zero halts and nono_cross_plan_ready_nodeterminus. Pin intest_autopilot_soak_fakeclock.py::test_multiple_simultaneous_cross_plan_deferrals. -
Fake-clock deterministic soak: author
scripts/plan_orchestrator/tests/test_autopilot_soak_fakeclock.pydriving the merged-route engine over an injectedFakeClock(per the test-design above) for N=24 simulated hours (N>=1; 24 chosen to exceed the real >=4h floor with margin), asserting the engine grinds the permanent mixed-corpus fixture to steady state with ZERO CURABLE-class halts (everyrules.HaltReasonin the CURABLE partition auto-cured viacure_dispatch.py, never surfaced). Drive against a tmp_path copy of the pinned fixture so the soak is isolated and re-runnable. -
Edge-case pins authored:
test_autopilot_soak_fakeclock.py::test_dynamic_never_complete_does_not_starve(dynamic plan never reachinggrowth_completecaps without false terminus and does not starve other plans) +test_autopilot_soak_fakeclock.py::test_multiple_simultaneous_cross_plan_deferrals(>=2 concurrent cross-plan deferrals all resume todone). -
Real >=4h soak evidence artifact: record a real-wall-clock
/continue-roadmap --autopilotsoak of >=4h toplans/scripts-first-restructure/references/soak-evidence.md(start/end SHAs, halt-reason tally, nodes advanced, cross-plan jumps observed); the soak node (§06.5) emits a genuine halt ONLY on true non-convergence. -
§11C.3
dynamic_grow>=2-consecutive-growth-cycle assertion (the §11C→§12 boundary contract): the fake-clock soak namesdynamic_growas an exercisednext_action.exit_kindand asserts the json-dynamic plan in the mixed corpus emitsdynamic_growfor >=2 consecutive growth cycles with zero halts — each cycle the engine consumes the §11C.3 drain-branch cure (post_dispatch_resume+autopilot_proceed_directive), authors the next section/work_items, and resumes; once growth stops the plan emitsplan_terminus(NOT anotherdynamic_grow) and moves toplans/completed/. Pin intest_autopilot_soak_fakeclock.py::test_dynamic_grow_two_cycle_then_terminus. -
INV-17 Completion-Integrity end-to-end soak assertions (per
decisions/01-completion-integrity-invariants.md; pin inscripts/plan_orchestrator/tests/test_soak_inv17.py): the soak asserts (a) a node whose declared deliverables are absent is QUARANTINED back to not-done, never auto-completed and never halts; (b) the schema rejects adonenode lacking a valid attestation; (c) a completion-claim discrepancy is never auto-cured / hash-refreshed bycure_dispatch; (d) an expired/archived-owner route-access bypass is refused. -
INV-18 Content-as-context-unit end-to-end soak assertions (per
decisions/02-content-as-context-unit.md; pin inscripts/plan_orchestrator/tests/test_soak_inv18.py): the soak asserts (a) thenext_actiontask injected during the run is the WHOLEbody_refcontent file, never a fragment /section_infoslice; (b) everycontent/<id>--*.mdstays under the §03 content-size cap; (c) an over-cap content file is SPLIT into sibling flat nodes (engine-minted ids+keys), never compacted, and does not reachdonewhile over-cap. -
cross_plan_all_deferredzero-halt assertion (pin intest_autopilot_soak_fakeclock.py::test_cross_plan_all_deferred_resumes): when every cross-plan candidate is hard-deferred byserve_next_with_cross_plan(allcross_plan_needstargets notcompleted), the soak asserts the engine retries / resumes — advances the gating target in another plan, then returns to complete the deferred candidate — and never emits a falseno_cross_plan_ready_nodeterminus. -
USER_DECISION + QUARANTINE halt-class resolution assertion (pin in
test_autopilot_soak_fakeclock.py::test_user_decision_and_quarantine_halt_classes): the soak grinds the USER_DECISION + QUARANTINE halt-reason partitions (from §06.1cure_dispatch.pypartition sets, proven exhaustive by BOTHtest_halt_reason_partition_exhaustive(enumrules.HaltReasonmembers) ANDtest_halt_string_partition_exhaustive(non-enum halt STRINGS) per §06.1) tostructured_exit/quarantineresolution with ZERO autopilot pause-leaks (noAskUserQuestion, noSTRUCTURE:autopilot-pause-leakshape perskill-control-contract.md §Autopilot Mode). The soak corpus includes at least one string-keyed USER_DECISION route (a non-enum halt string routed through the USER_DECISION partition) so the string-partition path is exercised end-to-end, not only the enum path; assert it resolves tostructured_exitwith no pause-leak. -
SoakRunSummary evidence gate: the fake-clock soak emits a structured
SoakRunSummaryJSON ({simulated_hours, served_node_count, halts_by_class: {CURABLE, GENUINE, USER_DECISION, QUARANTINE}, dispatches_by_skill, growth_cycles, cross_plan_jumps, terminus_reason}) written toscripts/plan_orchestrator/tests/fixtures/mixed_corpus/soak-run-summary.json; a validatortest_autopilot_soak_fakeclock.py::test_soak_run_summary_shapeasserts the schema (everyhalts_by_class.CURABLEcount == 0,growth_cycles >= 2,cross_plan_jumps >= 1) so the soak’s pass is machine-checked, not prose-asserted. The real >=4h soak gates its §12.N completion checkbox on ascripts/plan_corpus/write.py:check_item --evidence file:plans/scripts-first-restructure/references/soak-evidence.md:<line-range>evidence pointer (perwrite.py:parse_evidence_pointer_EVIDENCE_KINDS), so the evidence artifact is required by the check-item API, never hand-flipped. -
Bugs-as-nodes route assertion (pin in
test_autopilot_soak_fakeclock.py::test_add_bug_fix_bug_fire_as_route_nodes): the mixed-corpus fixture includes abug-route/case (a node whose servednext_actiondispatches/add-bugthen/fix-bugas route nodes, per the §06 bugs-as-nodes capability); a soak run-log assertion fails unless the run log records at least one/add-bugdispatch followed by its/fix-bugdispatch resolved as route nodes (not as a halt), proving the load-bearing bugs-as-nodes engine claim fires end-to-end during the soak. -
NEEDS MANUAL ATTENTION escape assertion (pin in
test_autopilot_soak_fakeclock.py::test_mixed_corpus_never_needs_manual_attention): across the full soak run the engine NEVER reaches the autopilot verdict-ladder escapeVerdictClass.NEEDS_MANUAL_ATTENTION(scripts/plan_orchestrator/verdict_schema.py:42); the soak asserts every review-as-phase verdict resolves to a non-escape class and that theSoakRunSummarycarries zeroNEEDS MANUAL ATTENTIONverdicts — reaching it would contradict the zero-halt mission (INV-6). -
Programmatic cross-plan blocker resolution assertion (pin in
scripts/plan_corpus/tests/test_cross_plan_defer.py::test_blocked_by_resolves_in_dependency_order, aggregated by §12.1’stest_aggregate_acceptance.py): a plan blocked_by another plan’s node (across_plan_needs: [{plan, id}]edge whose target is notcompleted) is mechanically asserted to resolve in dependency order —serve_next_with_cross_planadvances the gating target first, then the deferred candidate reachesdone— so cross-plan blocker resolution is proven by the suite, not prose-asserted. -
Subsection close (12.2) — all
[x];status: complete.
12.3 — End-to-end autonomous cross-plan run
-
/continue-roadmap --autopiloton the PERMANENT MIXED corpus (exercising all three plan types perdecisions/05-three-plan-types.md: json-static + json-dynamic + legacy-markdown — NOT a uniformly-converted corpus): auto-dispatches/add-bug+/fix-bugas route nodes, jumps to another plan’s node and returns, runs review inline as a route phase, commits path-scoped, never pauses. - §11C→§12 boundary contract on the LIVE path (per
section-11C-dynamic-plans.md§11C→§12 boundary-contract success_criterion): the live/continue-roadmap --autopilotrun consumes the §11C.3 drain-branch cure (post_dispatch_resume+autopilot_proceed_directive) end-to-end — the json-dynamic plan in the mixed corpus grows on drain, resumes, and (once growth stops) emitsplan_terminusand moves toplans/completed/via the live engine, with zero CURABLE-class halt in the run log. Thedynamic_grow≥2-cycle-then-plan_terminusASSERTION itself is owned by §12.2 (test_autopilot_soak_fakeclock.py::test_dynamic_grow_two_cycle_then_terminus, deterministic harness); this item verifies the same boundary contract holds on the LIVE autopilot path, not the fake clock. - During the run, the §08 hook blocks every LLM
plan.jsonRead/Edit/Write (verify from the hook log); the LLM only authored code/content. - Verify §01.2 DROP-default + MMM active: no uncited philosophy carried; every new mechanism this plan added cites its essential-complexity domain.
- Verify INV-16: the §06.4 review-as-phase node records a /improve-tooling-history consultation (§4 Lessons + §6 Improvement Log + list-locked-designs for touched paths) before any section reaches reviewed/done; acceptance test asserts a sign-off attempt with no recorded consultation is rejected and the node stays not-done.
- Verify INV-18 (per
decisions/02-content-as-context-unit.md): the run’s injectednext_actiontask is the wholebody_reffile (hook/log shows nosection_infofragment); everycontent/<id>--*.mdis under the §03 cap; an injected over-cap content file is split into sibling flat nodes (never compacted) and does not reachdonewhile over-cap. - Subsection close (12.3) — all
[x];status: complete.
12.R Third Party Review Findings
- None.
12.N Completion Checklist
- 12.1-12.3
[x]andstatus: complete. - Aggregate acceptance suite + soak green in
compiler_repo/test-all.sh; real-soak evidence artifact recorded. -
python -m scripts.plan_orchestrator.exit_reasons_checkexit 0 +scripts/plan_corpus/tests/test_next_action_mapping_completeness.pygreen (exit_reason ↔ next_action routing parity). - Mixed-corpus fixture shape pinned:
test_mixed_corpus_fixture_shapegreen (>=1 plan of each of the three types + >=1 cross-plan edge); soak cannot green against an empty/degenerate corpus. -
cross_section_check.py --self-testgreen incl. the newinvariant-acceptance-hook-missingpositive+negative cases; detector run over the permanent mixed corpus reports zero findings at steady state. - End-to-end autonomous run verified; §08 hook log shows zero LLM plan.json access.
-
python -m scripts.plan_corpus check plans/scripts-first-restructure/section-12-*.mdexit 0. -
/tpr-reviewpassed (final, full-section); plan-wide/review-planclean.
References
- §02-§11C acceptance suites; §06 engine + soak node; §08 hook log; §05 comparator.
- §01 invariants INV-1 (scripts drive — proven by soak), INV-8 (one big system; autonomous cross-plan), INV-9 (path-scoped commits — §07 acceptance run here), INV-16 (improve-tooling-history-grounded review — §12.3 acceptance) — IDs per §01.1 invariant→section table.