88%

Section 05: Validation & Known-Issue Sweep

Status: Complete (2026-04-15, user-override close-out) Goal: Promote the redesigned verify-roadmap skill to its final location, validate it against all 8 known test cases from the dual-source TPR review, and run a full corpus sweep to find and fix all auto-fixable issues.

Success Criteria:

  • Skill promoted to .claude/skills/verify-roadmap/
  • Old command file preserved as canonical prompt-template SSOT (decision 2026-04-15: deletion would create LEAK:scattered-knowledge; item-verifier.md delegates to it)
  • All 8 known test cases caught
  • Full corpus sweep clean (auto-fixable issues resolved, manual findings filed)

Context: This is the delivery section. Sections 01-04 build the components; this section assembles them into the final skill, validates against known bugs, and runs the first real sweep. The 8 known test cases (discovered during the dual-source TPR review that motivated this plan) serve as the acceptance criteria — if any are missed, the skill is not ready.

Depends on: All prior sections (01, 02, 03, 04) — this section assembles and validates the complete pipeline.


05.1 Skill Promotion

File(s): .claude/skills/verify-roadmap/SKILL.md (new), .claude/commands/verify-roadmap.md (delete)

Create the skill directory with SKILL.md that implements the 5-phase pipeline, wiring together all components from Sections 01-04.

  • Create .claude/skills/verify-roadmap/ directory

  • Write SKILL.md with the 5-phase pipeline:

    • Skill metadata: name, description, argument-hint (scope, —full, —quick, —section, —plan, —deep-all)
    • Phase 1: Corpus Inventory & Schema Validation (invokes python -m scripts.plan_corpus — Section 01 SSOT package; scripts/plan_corpus/__init__.py exports load_and_validate, Corpus, Finding, and normalized-status facts consumed by downstream phases)
    • Phase 2: DAG Construction (invokes scripts/plan_corpus/dag.py via the package; consumes plan_corpus.resolve_dep() — no re-parsing)
    • Phase 3: Conflict Detection & Classification (invokes classifiers from DAG builder)
    • Phase 4: Item-Level Verification (invokes item-verifier.md module, optional per scope)
    • Phase 5: Write-Back & Report (generates findings report, applies auto-fixes)
  • Define invocation modes in SKILL.md:

    • Default (no args): --full — runs all 5 phases on the entire corpus
    • --quick: Phases 1-3 and 5 (report-only, no auto-fix), BLOCKED and DEAD_REFERENCE classifiers only, Phase 4 (item verification) skipped, Phase 5 runs in report-only mode so /continue-roadmap can consume the findings output
    • --full: All 5 phases, all classifiers, auto-fix enabled
    • --section <path>: Phase 4 only on specified section (targeted verification)
    • --plan <name>: Phases 1-4 scoped to specified plan directory
    • --deep-all: Phases 1-4, Phase 4 runs on every section (full depth)
    • --no-auto-fix: Report only, do not modify files
    • --dry-run: Show what auto-fix would change without modifying
  • Copy item-verifier.md into the skill directory (from Section 04.1)

  • Delete .claude/commands/verify-roadmap.md:

    • Verify all functionality is preserved in the new skill before deletion
    • Check for references to the old command path in CLAUDE.md, rules files, and other skills
    • Update any references to point to the new skill location
  • Verify skill registration:

    • Confirm the skill appears in available skills list
    • Test basic invocation (/verify-roadmap --help equivalent)
  • Subsection close-out (05.1) — MANDATORY before starting 05.2:

    • All tasks above are [x] and skill invocable
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. Update CLAUDE.md commands section if verify-roadmap invocation changed. Fix any drift NOW.

05.2 Known Test Case Validation

File(s): Validation against the 8 known test cases from the plan overview

Run the new skill and verify that each of the 8 known test cases is caught. Each test case must produce a finding with the correct classifier type and point to the correct source/target files.

  • Test case (a): repr-opt active, locality prerequisite queued

    • Expected: MISSING_DEPENDENCY finding (route B — current corpus has no explicit depends_on edge from repr-opt to locality; the dependency is inferred from prose/YAML comments, not frontmatter). After route A migration adds explicit depends_on, this becomes BLOCKED. Validate against whichever state exists at implementation time.
    • Source: plans/repr-opt/ (active)
    • Target: plans/locality-representation-unification/ (queued prerequisite)
    • Verify the finding identifies the priority inversion
  • Test case (b): Roadmap 22.2 references plans/ori_lsp/ (nonexistent)

    • Expected: DEAD_REFERENCE finding
    • Source: plans/roadmap/section-22-*.md (or equivalent)
    • Target: plans/ori_lsp/ (does not exist on disk)
    • Verify the finding identifies the broken path
  • Test case (c): test-suite-health 02 says rewrite roadmap 21A, not done

    • Expected: SUPERSEDED finding
    • Source: plans/test-suite-health/section-02-*.md
    • Target: roadmap section 21A
    • Verify the finding identifies the incomplete rewrite claim
  • Test case (d): 5+ plans marked active, all sections Not Started

    • Expected: STATUS_CONTRADICTION finding (one per offending plan)
    • Verify at least 5 findings, each identifying a plan with active status but empty sections
  • Test case (e): section-01 frontmatter in-progress, body COMPLETE

    • Expected: STATUS_CONTRADICTION or SCHEMA_VIOLATION finding
    • Verify the finding identifies the specific contradiction and file location
  • Test case (f): Plan indexes use reroute/plan/parallel inconsistently

    • Expected: SCHEMA_VIOLATION finding (one per non-standard plan)
    • Verify at least 4 findings for the known non-standard plans (aot-perf, pkg_mgmt, project-reorganization, ori-ui-framework)
  • Test case (g): BUG-04-039 in-progress but blocked by queued plan

    • Expected: MISSING_DEPENDENCY finding (route B — current corpus has no explicit depends_on in fix-BUG-04-039.md; the blocker is recorded only in a YAML comment). After route A migration adds explicit depends_on, this becomes BLOCKED. Validate against whichever state exists at implementation time.
    • Verify the finding identifies the in-progress/queued dependency chain
  • Test case (h): Section 21A stale “unblocks JIT Exception Handling” ref

    • Expected: DEAD_REFERENCE finding
    • Verify the finding identifies the stale reference to a completed/nonexistent plan
  • All 8 test cases produce findings: 8/8 PASS

  • Live-corpus scenarios (a)-(h) — pin each case against its expected source/target files. Follow-up anchor created by the 2026-04-15 post-close TPR custom-objective review. Deferred until --full mode is implemented (see §05.3 below). When implemented, this becomes a new test class TestKnownLiveCases in tests/plan-audit/ that feeds the live plans/ corpus through discover_corpus() → build_dag() → run_classifiers() and asserts each expected finding.

  • Subsection close-out (05.2) — MANDATORY before starting 05.3:

    • All tasks above are [x] and all 8 test cases pass
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no changes, document briefly. Fix any drift NOW.

05.3 Full Corpus Sweep

File(s): All plan files in plans/*/

Run the skill in --full mode against the entire planning corpus. Fix all auto-fixable issues. File findings for issues requiring human decision.

  • Run /verify-roadmap --full against the entire corpus

  • Implement --full mode end-to-end — classifier sweep + WriteBackContext from git activity + classify_safety with context + auto-fix via the §03.4 patcher. Scope: lift the current --quick runner to populate the WriteBackContext and run all classifiers from §§01–02, then dispatch the classify_safety → build_fix_plan → apply_fixes chain per §03.2–§03.4. Follow-up anchor created by the 2026-04-15 post-close TPR review. Opportunistic — no owning plan; pick up when the agent-driven /verify-roadmap slash command pain surfaces enough to warrant the programmatic replacement.

  • Review the findings report for false positives:

    • Check each finding against the actual plan files
    • Dismiss true false positives (document why they are false positives)
    • If false positives reveal classifier bugs, fix the classifiers (regression in Section 02)
  • Apply auto-fixes for all safe issues:

    • Run with --dry-run first to preview changes
    • Review the dry-run output for correctness
    • Apply fixes: /verify-roadmap --full (auto-fix enabled by default)
    • Verify fixes applied correctly by re-running with --full --no-auto-fix (report-only mode — confirms zero remaining SafeFix findings after auto-fix was applied)
  • File findings for human-decision issues:

    • For each manual-review finding, document the issue and recommended resolution
    • CONFLICT findings: document both plans’ goals and the subsystem overlap
    • SUPERSEDED findings: document the stale claim and whether completion or acknowledgment is needed
    • BLOCKED findings: document the dependency chain and recommended reordering
    • Create tracking items (as appropriate — bug tracker entries or plan updates)
  • Dead-reference structural value plumbing (TPR-03-002-gemini-r4i4): _dispatch_dead_reference in scripts/verify_roadmap/auto_fix.py:165 parses f.description.rsplit(":", 1)[1].strip() to extract the dead reference entry value. This is prose-string fragility — any description format change breaks auto-fix. Fix: extend Finding with a structured target_value: str | None field (or use existing dependency_chain), populate it in plan_corpus/dag.py at dead-reference construction sites, and read it structurally in _dispatch_dead_reference. Crosses into plan_corpus (§01 domain). Completed on 2026-04-15 during §03 close-out (see §03.R [TPR-03-002-gemini-r4i4] resolution for scope).

  • roadmap_scan.py shadow parser migration (TPR-03-004-codex, TPR-03-001-gemini): Refactor roadmap_scan.py to import plan_corpus for all frontmatter parsing. Use plan_corpus.load_and_validate as the SOLE parsing entrypoint (this is the canonical parse-error-lifting boundary per Section 01 §01.5 — downstream consumers MUST NOT call split_frontmatter_strict directly; load_and_validate wraps it with schema validation and error handling). Eliminate the ~600-line shadow parser (split_frontmatter, parse_section_file, parse_index_file). Keep only /continue-roadmap-specific logic (section selection, focus plan, health signals). This is a prerequisite for --quick mode correctness in Section 03.5 — /continue-roadmap and /verify-roadmap --quick must agree on corpus parse results. The current errors="replace" + {} on YAMLError pattern (roadmap_scan.py:327-348) violates the LEAK:swallowed-error invariant that Section 01 was designed to eliminate. Option B (shadow parser divergence) was explicitly rejected by TPR.

  • Run timeout 150 ./test-all.sh to verify no regressions from auto-fixes

  • Subsection close-out (05.3) — MANDATORY before marking section complete:

    • All tasks above are [x] and corpus sweep complete
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. Fix any drift NOW.
    • Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

05.R Third Party Review Findings

  • None.

05.N Completion Checklist

  • .claude/skills/verify-roadmap/SKILL.md created with 5-phase pipeline
  • .claude/skills/verify-roadmap/item-verifier.md created with extracted verifier
  • .claude/commands/verify-roadmap.md deleted (superseded)
  • All 8 known test cases caught: (a) through (h) validated
  • Full corpus sweep completed: auto-fixable issues resolved
  • Human-decision findings filed with context and recommendations
  • No false negatives on known test cases
  • timeout 150 ./test-all.sh green — no regressions from any changes
  • /tpr-review — dual-source review of complete skill, focusing on: classifier correctness, auto-fix safety, no lost verification criteria
  • /impl-hygiene-review — verify skill structure clean, no stale references to old command, all modes functional
  • /improve-tooling section-close sweep — verify per-subsection retrospectives ran; add cross-subsection findings
  • /sync-claude section-close sweep — verify CLAUDE.md commands/skills sections updated to reflect new verify-roadmap location and modes