98%

Section 03: Findings Report & Write-Back

Status: In Progress Goal: Design the findings report format and implement the write-back mechanism that auto-fixes safe issues and flags issues requiring human decision. Connect the output to /continue-roadmap so cross-plan conflicts surface during active roadmap work.

Success Criteria:

  • Safety taxonomy (SafetyClass, ClassifiedFinding, WriteBackContext) defined and tested
  • Findings report format defined and implemented (JSON + markdown + console)
  • Frontmatter text patcher operates on raw text (regex), never PyYAML dump/reload
  • Auto-fix engine handles safe issues without human intervention, with concurrent-session guards
  • Manual-review issues are flagged with clear context and recommended actions
  • Integration with /continue-roadmap surfaces findings during active work

Context: Sections 01 and 02 produce raw findings (schema violations, DAG conflicts, priority inversions). This section turns those findings into actionable output: a structured report for review, an auto-fix engine for safe corrections, and integration with the existing /continue-roadmap workflow so findings surface at the right time. The distinction between auto-fixable and manual-review issues is critical — auto-fixing frontmatter field renames is safe; auto-resolving goal conflicts between plans is not.

Depends on: Section 02 (DAG Builder) — the report format depends on the classifier output structure.

Architectural decisions:

  1. PyYAML is read-only. PyYAML safe_load is used to PARSE frontmatter. It is NEVER used to WRITE frontmatter back. PyYAML dump destroys YAML comments (which are DAG signal per Section 02’s HTML_COMMENT_CONVENTION and YAML_COMMENT source kinds), reorders keys, strips trailing whitespace, normalizes quoting style, and flattens multi-line strings. All frontmatter writes go through the targeted text patcher (03.4) which operates on the raw text slice between the --- fences using line-level regex replacements. This is the ONLY safe write path. See also: roadmap_scan.py:344 (yaml.safe_load for read) — the same constraint applies there.

  2. Safety taxonomy lives here, not in plan_corpus. SafetyClass, ClassifiedFinding, and classify_safety are write-back policy. The plan_corpus library produces factual Finding records (no policy); this section consumes those findings and classifies them for write-back. plan_corpus must never import from this module.

  3. Concurrent-session safety is mandatory. The user runs parallel agent sessions with uncommitted work (see MEMORY.md feedback_never_destructive_git.md). Any read-modify-write on plan files must: (a) record a preimage hash at scan time, (b) re-read and hash-compare before write, (c) write to a temp file and os.replace atomically. If the file changed between scan and write, refuse to apply and log the conflict.


03.1 Safety Taxonomy & Data Types

File(s): New module in the verify-roadmap skill (e.g. scripts/verify_roadmap/safety.py or inline in the skill’s write-back logic)

Define the safety taxonomy data types that the report format (03.2) and auto-fix engine (03.3) both consume. This subsection exists to break the circular dependency identified by tp-help: the report format needs ClassifiedFinding to serialize, and the auto-fix engine needs SafetyClass to gate writes — both need the types before either can be implemented.

  • Define SafetyClass(Enum): SafeFix | ExposureReview — the auto-fix gating tag:

    • SafeFix findings are applied automatically (with backup + log)
    • ExposureReview findings are surfaced for human review (never auto-applied)
  • Define ClassifiedFinding dataclass:

    • Fields: finding: Finding, safety_class: SafetyClass, rationale: str
    • Wraps a plain Finding (imported from plan_corpus; NO safety_class on the Finding itself per 01.3)
    • Section 03 produces ClassifiedFinding records; Sections 01/02 never do
  • Define WriteBackContext dataclass:

    • Field: has_recent_commits: dict[Path, bool] — maps plan directories to git activity signal
    • The CLI front-end populates this by running git log --since=14d -- plans/<name>/ at the edge
    • plan_corpus stays pure — grep-verify it contains no subprocess or git calls
    • --quick mode optimization (blind spot #10): WriteBackContext construction requires O(N) git log subprocess calls per plan. --quick mode runs only read-only DAG checks (BLOCKED, DEAD_REFERENCE) which do not need git signals. --quick MUST bypass WriteBackContext population entirely by passing context=None to the report generator. classify_safety in --quick mode skips classification and marks all findings as ExposureReview (report-only, no auto-fix). This is a correctness optimization, not just performance — --quick is a pre-check, not a write-back trigger.
  • Define PreimageRecord dataclass (concurrent-session guard):

    • Fields: path: Path, content_hash: str, scan_timestamp: float
    • content_hash is hashlib.sha256(path.read_bytes()).hexdigest()
    • Captured at scan time for every file that might be modified
    • Used by the text patcher (03.4) to detect concurrent modifications before write
  • Implement classify_safety(finding: Finding, context: WriteBackContext | None, frontmatter_data: dict | None = None) -> ClassifiedFinding:

    • Signature note (TPR-03-001-gemini): the frontmatter_data parameter carries the parsed frontmatter dict for the finding’s source file. This allows classify_safety to inspect sibling fields (e.g., checking whether both plan: and name: exist for the collision guard) WITHOUT performing I/O — the dict is pre-parsed by plan_corpus.parser at scan time. The function remains pure: (finding, context, dict) -> ClassifiedFinding.
    • When context is None (—quick mode): return ClassifiedFinding(finding, ExposureReview, "quick mode — no write-back classification")
    • When context is provided: dispatch on finding.category + finding.subtype:

    SCHEMA_VIOLATION subtypes — SafeFix:

    • Field rename plan: -> name:NOTE: OverviewSchema canonically uses plan: (see schemas.py:89-91); PlanIndexSchema canonically uses name: (see schemas.py:39). Renaming plan: to name: is ONLY valid on files where the schema expects name: but the file has plan: instead (i.e., the file is a PlanIndexSchema file misusing plan:). SafeFix ONLY when:
      • The target file’s schema class is PlanIndexSchema (the schema that requires name:) AND the file has plan: instead of name:
      • NOT valid on OverviewSchema files — those canonically use plan: as a required field; renaming it to name: would violate the schema
      • Collision guard (blind spot #3): if the file already has BOTH a plan: key AND a name: key with DIFFERENT values, this is ExposureReview (human must decide which value to keep). Check uses frontmatter_data parameter: "plan" in frontmatter_data and "name" in frontmatter_data and frontmatter_data["plan"] != frontmatter_data["name"] — no I/O needed, dict is pre-parsed
      • If plan: exists and name: does not (on a PlanIndexSchema file), SafeFix: rename key preserving value byte-for-byte
      • If both exist with identical values, SafeFix: remove the plan: key (redundant)
      • Paired-finding deduplication (TPR-03-002-gemini): When plan: is used instead of name:, plan_corpus.schema emits TWO findings: UNKNOWN_FIELD: plan AND MISSING_REQUIRED_FIELD: name. The rename SafeFix resolves BOTH. The auto-fix dispatcher (03.3) must deduplicate these: when a plan:→name: rename is applied, mark the paired MISSING_REQUIRED_FIELD: name finding as resolved-by-sibling (do NOT surface it as a separate ExposureReview). Add a resolved_by_sibling: Finding.id | None field to ClassifiedFinding for this case.
    • Removing reroute: false — SafeFix (default-equivalent value)
    • Adding missing reviewed: false default — SafeFix ONLY for PlanSectionSchema and RoadmapSectionSchema where reviewed: bool is a REQUIRED field with no default (see schemas.py:62,76). Workflow behavior guard (blind spot #7): for PlanIndexSchema where reviewed: bool | None = None is OPTIONAL, auto-inserting reviewed: false is ExposureReview because it triggers the /continue-roadmap Step 1.7 unreviewed-plan gate (see SKILL.md:205-218). The absence of the field means “no review state” (None), which does NOT trigger the gate; false actively triggers it. This is a semantic change, not normalization.
    • Adding missing third_party_review: {status: none, updated: null} — SafeFix where the field is required by schema (PlanSectionSchema, FixBugSchema)

    SCHEMA_VIOLATION subtypes — ExposureReview:

    • MISSING_REQUIRED_FIELD when the missing field needs semantic inference from body content (e.g. missing frontmatter entirely — reconstructing canonical frontmatter from body content is semantic inference, not normalization)

    STATUS_CONTRADICTION subtypes:

    • PLAN_ACTIVE_ALL_SECTIONS_NOT_STARTED — SafeFix IFF context.has_recent_commits[plan_dir] == False (no activity supports status=queued); else ExposureReview (recent commits suggest the plan IS actively being worked on but sections are stale — needs human)
    • FM_DECLARED_VS_BODY_DERIVEDALWAYS ExposureReview (blind spot #4). The normalizer (normalizer.py:155-159) returns derived="complete" when has_complete_marker is True even when unchecked > 0 (aspirational COMPLETE marker with remaining work). Auto-fixing status to complete based on this derivation is WRONG — it would mark plans as complete when they have unchecked checkboxes. The normalizer intentionally returns “complete” to trigger the FM_DECLARED_VS_BODY_DERIVED finding; the finding itself is the signal that human review is needed, not that auto-fix should proceed. The auto-fix engine MUST NOT override the ExposureReview classification for this subtype.
    • PLAN_COMPLETE_WITH_OPEN_SECTIONS — ExposureReview (semantic decision: complete open sections or downgrade plan status)
    • All other STATUS_CONTRADICTION subtypes — ExposureReview by default (conservative)

    DEAD_REFERENCE subtypes — SafeFix (frontmatter only):

    • PLAN_DIRECTORY_NOT_FOUND / SECTION_FILE_NOT_FOUND / CROSS_PLAN_NAME_NOT_FOUND when the dead reference is in a depends_on frontmatter list entry (mechanical removal from a YAML list). Prose body references are ALWAYS ExposureReview (human-authored replacement may be needed)
    • SPEC_FILE_NOT_FOUND — ExposureReview (NOT SafeFix). The spec: field lives on RoadmapSectionSchema (schemas.py:81) and references spec file paths. A dead spec reference may indicate a spec file was renamed or reorganized — the correct target needs human determination. Unlike depends_on entries where removal is mechanical, a missing spec file may need a replacement path, not deletion.
    • Audit trail guard (blind spot #8): dead-reference removal audit trail goes to build/verify-roadmap/fixes-applied.json, NOT as inline HTML comments. An inline <!-- Removed dead reference to plans/X/ --> comment would be re-scanned by Section 02’s HTML_COMMENT_CONVENTION parser and produce false positive MISSING_DEPENDENCY findings in future runs. The fixes-applied.json log is the audit trail.

    All other categories:

    • PARSE_ERROR, DAG_CONFLICT, ITEM_VERIFICATION, GAP — ExposureReview by default (conservative; never auto-applied). The default branch MUST record the rationale "no SafeFix rule declared for <category>/<subtype>"

    • Each ClassifiedFinding carries a rationale string explaining why it got its class

    • Pure function of (finding, context) — no I/O inside classify_safety itself

  • Tests (TDD — write before implementation):

    • Matrix: every (FindingCategory, FindingSubtype) pair in types.py:_CATEGORY_SUBTYPES must have a test case asserting its safety classification
    • Semantic pins: FM_DECLARED_VS_BODY_DERIVED -> ExposureReview (pin: revert to SafeFix -> test fails)
    • Semantic pins: PLAN_ACTIVE_ALL_SECTIONS_NOT_STARTED with has_recent_commits=True -> ExposureReview
    • Semantic pins: PLAN_ACTIVE_ALL_SECTIONS_NOT_STARTED with has_recent_commits=False -> SafeFix
    • Negative pins: classify_safety with context=None MUST return ExposureReview for every finding
    • Collision guard pin: plan: -> name: rename when both keys exist with different values -> ExposureReview
    • Collision guard pin: plan: -> name: rename when both keys exist with same values -> SafeFix (remove plan:)
    • Workflow behavior pin: reviewed: false insertion on PlanIndexSchema -> ExposureReview
    • Workflow behavior pin: reviewed: false insertion on PlanSectionSchema -> SafeFix
  • Subsection close-out (03.1) — MANDATORY before starting 03.2:

    • All tasks above are [x] and types + classify_safety tested
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no changes, document briefly. Fix any drift NOW.

03.2 Report Format

File(s): Report generation integrated into the verify-roadmap skill pipeline

Design and implement the findings report format. The report must be both human-readable (markdown) and machine-parseable (JSON) for downstream tool integration. This subsection CONSUMES the types defined in 03.1.

  • Import the finding data model from plan_corpus (01.3 SSOT — do NOT redefine here):

    • Finding = {id, category, subtype, severity, source, source_line, source_column, target, target_line, description, recommended_fix, evidence, dependency_chain, source_kind}
    • FindingCategory and FindingSubtype enums are imported (see Section 01.3 for the complete taxonomy)
    • Finding.to_json() / Finding.to_markdown() are used as-is; Section 03 only wraps them into a report
  • Import ClassifiedFinding and SafetyClass from 03.1 (local to this section’s module; NOT from plan_corpus). The report serializes ClassifiedFinding records — each entry includes the finding data PLUS the safety classification, rationale, and sibling resolution state.

  • Implement JSON report output:

    • Array of ClassifiedFinding objects: each has finding (the Finding.to_json() dict), safety_class ("safe_fix" or "exposure_review"), rationale (string), resolved_by_sibling (Finding.id string or null — non-null when this finding was resolved as a side-effect of fixing a paired finding, e.g., MISSING_REQUIRED_FIELD: name resolved by the plan:→name: rename)
    • Written to build/verify-roadmap/findings.json (build directory, not committed)
    • Include metadata header: timestamp, corpus size, classifier versions, mode (--full / --quick)
    • When mode is --quick, omit safety_class and rationale fields (classification was not performed)
  • Implement markdown report output:

    • Grouped by severity (critical first, then high, medium, low)
    • Within each severity, grouped by safety class (ExposureReview first, then SafeFix)
    • Within each group, sorted by classifier type
    • Each finding shows: type badge, source -> target, description, recommended fix, safety classification
    • Summary table at top: count by type and severity, count by safety class
    • Written to build/verify-roadmap/findings.md (build directory, not committed)
  • Implement console summary output:

    • One-line-per-finding format for terminal display
    • Color-coded by severity (if terminal supports it)
    • SafeFix findings marked with [auto] prefix; ExposureReview with [review]; unapplied fixes marked with [UNAPPLIED] (concurrent-modification refusal from PatchResult(applied=False))
    • Exit code reflects findings: 0 = clean, 1 = findings present, 2 = critical findings
  • Unapplied-fix report surface (TPR-03-003-codex / TPR-03-002-gemini): The report format must surface PatchResult(applied=False) results from the auto-fix engine as a distinct group in both JSON and markdown output. In JSON: add an unapplied_fixes array alongside the main findings array. In markdown: add an “Unapplied Fixes” section after the main findings grouped by reason (concurrent modification, malformed file, etc.). These are NOT dropped — they represent intended work that could not safely complete.

  • Tests (TDD):

    • Round-trip test: ClassifiedFinding -> JSON -> parse -> verify all fields preserved
    • Markdown grouping test: verify severity ordering, safety class ordering
    • Exit code test: 0 for empty findings, 1 for low/medium, 2 for critical
    • --quick mode test: verify JSON output omits safety_class/rationale
    • Unapplied-fix surface test: verify that PatchResult(applied=False) entries appear in the unapplied_fixes group in both JSON and markdown reports (not silently dropped)
  • Subsection close-out (03.2) — MANDATORY before starting 03.3:

    • All tasks above are [x] and report generates correctly on current corpus
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no changes, document briefly. Fix any drift NOW.

03.3 Auto-Fix Engine

File(s): Auto-fix logic integrated into verification pipeline

Implement automatic fixes for findings classified as SafeFix by 03.1’s classify_safety. Safety criterion: a fix is auto-fixable if it cannot change plan semantics — only metadata normalization.

  • Implement auto-fix dispatcher:

    • Input: list of ClassifiedFinding records
    • Filter to safety_class == SafeFix only
    • For each SafeFix finding, dispatch to the appropriate fix handler based on finding.category + finding.subtype
    • All fixes go through the text patcher (03.4) — the auto-fix engine NEVER writes files directly
  • Implement auto-fix for SCHEMA_VIOLATION SafeFix findings:

    • Field rename plan: -> name: (via text patcher: regex replace ^plan: with name: in frontmatter slice; preserving value byte-for-byte)
    • Field removal: reroute: false -> remove entire line from frontmatter slice
    • Default field insertion: add reviewed: false via text patcher (insert line in frontmatter slice) — only for PlanSectionSchema/RoadmapSectionSchema files (see 03.1 workflow behavior guard)
    • Default field insertion: add third_party_review: block — only for schemas where required
  • Implement auto-fix for STATUS_CONTRADICTION SafeFix findings:

    • PLAN_ACTIVE_ALL_SECTIONS_NOT_STARTED (when classified SafeFix by 03.1): change status: active to status: queued in frontmatter via text patcher
    • NOTE: FM_DECLARED_VS_BODY_DERIVED is NEVER SafeFix (see 03.1). The auto-fix engine MUST assert that no FM_DECLARED_VS_BODY_DERIVED finding reaches the SafeFix dispatch — this is a defense-in-depth invariant. If it fires, the classifier has a bug.
    • parallel: true guard (from Section 01.2): parallel: true is a VALID canonical PlanIndexSchema field. Auto-fix MUST NOT remove it. Verify no fix handler touches fields outside its explicit scope.
  • Implement auto-fix for DEAD_REFERENCE SafeFix findings:

    • Remove dead depends_on entries from frontmatter list via text patcher
    • Audit trail in fixes-applied.json only (blind spot #8): do NOT add inline HTML comments like <!-- Removed dead reference to plans/X/ -->. Section 02’s HTML_COMMENT_CONVENTION parser scans for blocked-by, unblocks, supersedes, resolves patterns in HTML comments. While a “Removed dead reference” comment does not match those verbs today, any future verb expansion or fuzzy matching would produce false positive MISSING_DEPENDENCY findings. The fixes-applied.json log is the permanent audit trail.
    • Do NOT auto-remove references from prose body text (might need human-authored replacement)
  • Implement safe-fix guards:

    • All auto-fixes create a backup of the original file in build/verify-roadmap/backups/
    • All auto-fixes are logged to build/verify-roadmap/fixes-applied.json with: finding ID, file path, fix type, before/after snippet, timestamp
    • --dry-run flag shows what would be fixed without modifying files
    • --no-auto-fix flag disables auto-fixing entirely (report-only mode)
    • Defense-in-depth: auto-fix engine MUST reject any finding that is not SafeFix — this is a hard assert, not a silent skip. If an ExposureReview finding leaks into the auto-fix path, it is a classifier bug and must fail loudly.
    • Concurrent-modification propagation (TPR-03-003-codex / TPR-03-002-gemini): when apply_patch returns PatchResult(applied=False) (preimage hash mismatch from concurrent session), the auto-fix dispatcher MUST convert the original SafeFix finding into an ExposureReview finding with the failure reason appended to the rationale (e.g., "SafeFix reverted to ExposureReview: file modified by concurrent session") and append it to the final report as an unapplied fix. The report format (03.2) must surface these as a distinct “unapplied fixes” group — they represent work the tool intended to do but could not safely complete. They MUST NOT be silently dropped.
  • Define manual-review flagging for non-auto-fixable findings:

    • CONFLICT findings: always manual — requires human decision on which plan’s goals take precedence
    • SUPERSEDED findings: always manual — requires acknowledgment that a reroute claim is stale or completion of the reroute. §02 handoff note (TPR-03-005-codex): Section 02 defines a git-aware SUPERSEDED specialization with two structural cases (section-02-dag-builder.md:251-252). classify_safety deliberately routes ALL SUPERSEDED findings to ExposureReview (never SafeFix) because SUPERSEDED resolution is inherently semantic — the user must decide whether the reroute claim is valid, stale, or in progress. WriteBackContext.has_recent_commits is available for future SafeFix graduation if a narrow, safe subcase is identified (e.g., “SUPERSEDED by a plan with status: resolved”), but no such subcase is implemented in this section. This is an explicit design decision, not an omission.
    • BLOCKED findings: always manual — requires plan reordering or dependency acknowledgment
    • MISSING_DEPENDENCY findings: always manual — requires explicit dependency declaration or acknowledgment of independence
    • All ExposureReview-classified findings: surfaced in the report with context and recommended actions
  • Tests (TDD):

    • Semantic pin: FM_DECLARED_VS_BODY_DERIVED reaching auto-fix dispatcher -> assert/panic (defense-in-depth)
    • Semantic pin: parallel: true field untouched by any fix handler
    • Matrix: each SafeFix subtype has a test case verifying the correct text transformation
    • Negative pin: ExposureReview finding passed to auto-fix dispatcher -> rejected
    • Backup test: verify backup file created before modification
    • Dry-run test: verify no file modifications in dry-run mode
    • Idempotency test: running auto-fix twice on the same corpus produces identical results
  • Subsection close-out (03.3) — MANDATORY before starting 03.4:

    • All tasks above are [x] and auto-fix engine tested on known cases
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no changes, document briefly. Fix any drift NOW.

03.4 Frontmatter Text Patcher

File(s): New module for targeted text-level frontmatter manipulation

This subsection implements the ONLY write path for frontmatter modifications. PyYAML is read-only; all writes go through targeted text patching on the raw frontmatter slice. This subsection also implements the concurrent-session safety guards.

Rationale (blind spot #1): PyYAML safe_load parses YAML into Python dicts (losing comments, key order, quoting style, trailing whitespace). If we were to modify the dict and yaml.dump it back, every comment in the frontmatter — including YAML comments that are DAG signal for Section 02’s YAML_COMMENT source kind — would be destroyed. Additionally, key ordering changes produce noisy git diffs. The text patcher operates on the raw text between the --- fences, using line-level regex replacements that preserve everything the fix does not explicitly target.

  • Implement extract_frontmatter_slice(text: str) -> tuple[str, int, int]:

    • Returns (frontmatter_text, start_offset, end_offset) — the raw text between --- fences (exclusive of fences)
    • Uses the same boundary detection as plan_corpus.parser.split_frontmatter_strict (exact fence regex from types.py:FRONTMATTER_FENCE) — note: the actual API name is split_frontmatter_strict, NOT split_frontmatter
    • Returns empty/zero on malformed files (no fences) — caller handles
  • Implement per-fix-type text operations (all operate on the frontmatter slice string):

    • rename_key(fm_text: str, old_key: str, new_key: str) -> str — regex ^{old_key}(\s*:.*)$ -> {new_key}\1 (preserves value, spacing, inline comments)
    • remove_key(fm_text: str, key: str) -> str — remove the entire line matching ^{key}\s*:.*$ (handles multi-line values by tracking indent)
    • replace_value(fm_text: str, key: str, new_value: str) -> str — regex ^({key}\s*:\s*).*$ -> \1{new_value} (preserves key formatting)
    • insert_key(fm_text: str, key: str, value: str, after_key: str | None) -> str — insert {key}: {value} on a new line after after_key (or at end of frontmatter if after_key is None)
    • remove_list_item(fm_text: str, list_key: str, item_value: str) -> str — remove a single - "value" entry from a YAML list under list_key, handling both inline [a, b] and block - a\n- b list styles
  • Implement apply_patch(path: Path, fm_operations: list[FmOperation], preimage: PreimageRecord) -> PatchResult:

    • FmOperation = (operation_type, **kwargs) matching the per-fix-type operations above
    • Concurrent-session guard (blind spot #6):
      1. Re-read path and compute sha256(content)
      2. Compare against preimage.content_hash
      3. If hashes differ: refuse to write, return PatchResult(applied=False, reason="file modified since scan by concurrent session")
      4. If hashes match: apply all operations to the frontmatter slice, reassemble full text, write to temp file (path.with_suffix('.tmp')) via os.replace for atomicity
    • Returns PatchResult(applied: bool, reason: str, before_hash: str, after_hash: str)
  • Implement reassemble_file(original_text: str, patched_fm: str, start_offset: int, end_offset: int) -> str:

    • Splice the patched frontmatter back into the original text at the correct offsets
    • Preserve everything before start_offset and after end_offset (including the --- fences)
  • Shadow parser note (blind spot #5): roadmap_scan.py (1462 lines) has its own split_frontmatter, parse_section_file, parse_index_file (~600 lines of parsing logic). This is LEAK:algorithmic-duplication with plan_corpus. The text patcher MUST NOT introduce a third frontmatter parser. It uses plan_corpus.types.FRONTMATTER_FENCE for boundary detection. The full roadmap_scan.py parser refactoring to import plan_corpus is tracked separately (it is a prerequisite for --quick mode correctness in 03.5, since /continue-roadmap and /verify-roadmap --quick must agree on corpus parse results). Migration tracked as concrete - [ ] in §05.3 (L187: “roadmap_scan.py shadow parser migration”) with <!-- unblocks:03.5 -->.

  • Tests (TDD):

    • Semantic pin: rename_key preserves YAML comments on the same line (name: foo # this is important)
    • Semantic pin: rename_key preserves YAML comments on adjacent lines
    • Semantic pin: remove_key handles multi-line YAML values (indented continuation lines)
    • Negative pin: apply_patch refuses write when preimage hash mismatches (concurrent modification)
    • Negative pin: apply_patch refuses write on malformed files (no frontmatter fences)
    • Atomicity test: interrupt during write -> original file intact (temp file may remain)
    • Round-trip test (TPR-03-003-gemini): extract -> modify -> reassemble -> parse with plan_corpus.parser (NOT yaml.safe_load) produces expected YAML dict — the strict parser must accept the patched output, not just a lenient YAML loader
    • Key ordering test: unmodified keys retain their original order after patch
    • Comment preservation test: YAML comments (# ...) and inline comments survive all operations
    • remove_list_item test: both inline [a, b] and block - a\n- b list styles handled
    • Collision guard integration test: plan: exists and name: exists with different values -> ExposureReview classification -> patcher never invoked
  • Subsection close-out (03.4) — MANDATORY before starting 03.5:

    • All tasks above are [x] and text patcher tested with comment-preserving round-trips
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no changes, document briefly. Fix any drift NOW.

03.5 Continue-Roadmap Integration

File(s): .claude/skills/verify-roadmap/SKILL.md, integration with roadmap-scan.sh

Integrate the findings report with /continue-roadmap so cross-plan conflicts surface during active roadmap work, not only during explicit /verify-roadmap runs.

  • Add a lightweight cross-plan check to roadmap-scan.sh:

    • Before /continue-roadmap selects the next section to work on, run a fast subset of the DAG analysis
    • Check whether the selected section has BLOCKED or DEAD_REFERENCE findings (the two classifiers included in --quick mode — NOT CONFLICT, which requires O(N^2) shared-subsystem analysis)
    • If findings exist, display them before proceeding and let the user decide whether to continue or switch to resolving the finding
  • Design the integration interface — resolve scope contradiction (blind spot #9):

    • The verify-roadmap skill exposes a --quick mode that runs ONLY BLOCKED and DEAD_REFERENCE checks (fast, no shared-subsystem analysis, no git signal population per 03.1)
    • Explicitly NOT included in --quick: CONFLICT (requires shared-subsystem analysis which is O(N^2)), STATUS_CONTRADICTION (requires body scanning), SUPERSEDED (requires reroute resolution), MISSING_DEPENDENCY (requires full prose scan)
    • The full mode (--full) runs all classifiers from Sections 01-02, runs classify_safety with full WriteBackContext, and applies auto-fixes
    • /continue-roadmap calls --quick mode as a pre-check; users invoke --full explicitly
    • --quick mode MUST NOT build WriteBackContext (blind spot #10): quick mode only runs read-only DAG checks. It skips git signal population entirely (no git log subprocess calls). It passes context=None to classify_safety (see 03.1), which returns ExposureReview for all findings. Report is generated in report-only mode (no auto-fix).
  • Document the integration in SKILL.md:

    • How /continue-roadmap uses the quick check
    • When to run /verify-roadmap --full manually (after plan changes, before major milestones)
    • How to interpret and act on findings
    • Explicit list of what --quick checks vs what --full checks (no ambiguity)
  • Shadow parser migration (blind spot #5, TPR-03-001-gemini mandate): roadmap_scan.py has ~600 lines of parsing logic (split_frontmatter, parse_section_file, parse_index_file) that duplicates plan_corpus. --quick mode MUST use plan_corpus for parsing — two diverging corpus truths is a LEAK:algorithmic-duplication that violates SSOT-2. Mandated approach (Option A): refactor roadmap_scan.py to import plan_corpus.load_and_validate as the sole parsing entrypoint (per Section 01’s SSOT boundary — downstream consumers MUST NOT call split_frontmatter_strict directly), keeping only the /continue-roadmap-specific logic (section selection, focus plan, health signals). This eliminates the errors="replace" + {} on YAMLError swallowed-error pattern (roadmap_scan.py:327-348) that Section 01 was designed to prevent. Option B (shadow parser divergence) is explicitly rejected — it would allow the known LEAK to survive with no committed follow-up, violating R-2 and R-3. The migration is tracked as a concrete - [ ] in Section 05.

  • Tests (TDD):

    • Integration test: /verify-roadmap --quick returns findings for a corpus with a known BLOCKED finding
    • Negative test: /verify-roadmap --quick does NOT return CONFLICT findings (not in —quick scope)
    • Performance test: --quick mode completes in < 5 seconds on the full corpus (no git log calls)
    • Semantic pin: --quick mode with context=None -> all findings classified as ExposureReview
  • Subsection close-out (03.5) — MANDATORY before marking section complete:

    • All tasks above are [x] and integration tested
    • Update this subsection’s status in section frontmatter to complete
    • Run /improve-tooling retrospectively on THIS subsection
    • Run /sync-claude on THIS subsection — check whether changes invalidated any CLAUDE.md, .claude/rules/*.md, or canon.md claims. If no changes, document briefly. Fix any drift NOW.
    • Repo hygiene check — run diagnostics/repo-hygiene.sh --check and clean any detected temp files.

03.R Third Party Review Findings

  • [TPR-03-001-codex][high] section-03:107 — Align schema-driven SafeFix rules with the schema SSOT. OverviewSchema uses plan: canonically, not name:. SPEC_FILE_NOT_FOUND needs its own handling. Resolved: Fixed on 2026-04-14. Corrected SafeFix table: plan:→name: rename restricted to PlanIndexSchema files only; OverviewSchema explicitly excluded. SPEC_FILE_NOT_FOUND reclassified to ExposureReview.
  • [TPR-03-002-codex][medium] section-03:330 — Make quick-mode continue-roadmap contract consistent. BLOCKED vs CONFLICT scope contradiction. Resolved: Fixed on 2026-04-14. Integration bullets now consistently specify BLOCKED + DEAD_REFERENCE only for —quick mode.
  • [TPR-03-003-codex][medium] section-03:286 — Propagate unapplied patch results into the report. Resolved: Fixed on 2026-04-14. Added concurrent-modification propagation to auto-fix guards (SafeFix→ExposureReview on hash mismatch), unapplied-fix report surface in 03.2, and test pin.
  • [TPR-03-004-codex][medium] section-03:348 — Replace roadmap_scan migration placeholder with concrete checkbox. Resolved: Fixed on 2026-04-14. Option B rejected. Concrete - [ ] added to Section 05.3 mandating Option A migration.
  • [TPR-03-005-codex][medium] section-03:240 — Either consume Section 02 SUPERSEDED handoff or remove it. Resolved: Fixed on 2026-04-14. Added explicit design decision: all SUPERSEDED → ExposureReview; WriteBackContext available for future SafeFix graduation.
  • [TPR-03-001-gemini][high] section-03:278 — Remove Option B and mandate shadow parser migration. Resolved: Fixed on 2026-04-14. Same fix as TPR-03-004-codex — Option B removed, Option A mandated, Section 05.3 item added.
  • [TPR-03-002-gemini][high] section-03:192 — Propagate concurrent modification failures to findings report. Resolved: Fixed on 2026-04-14. Same fix as TPR-03-003-codex — auto-fix dispatcher converts to ExposureReview, report format surfaces unapplied fixes.
  • [TPR-03-003-gemini][medium] section-03:253 — Explicitly require plan_corpus.parser in round-trip test. Resolved: Fixed on 2026-04-14. Test description updated to specify plan_corpus.parser, not yaml.safe_load.

Round 2 findings (iteration 2, 2026-04-14):

  • [TPR-03-001-codex-r2][medium] section-05:187 — Replace roadmap_scan migration with real plan_corpus API surface. References to nonexistent parse_section_file/parse_index_file. Resolved: Fixed on 2026-04-14. Updated §03.4 and §05.3 to use actual API: read_text_strict, split_frontmatter_strict, load_and_validate.
  • [TPR-03-002-codex-r2][high] section-02:251 — Align §02 SUPERSEDED handoff with §03’s all-ExposureReview decision. §02 still said “to route SafeFix vs ExposureReview.” Resolved: Fixed on 2026-04-14. Updated §02 handoff text: git_status enrichment is advisory/reporting, not SafeFix routing. All SUPERSEDED → ExposureReview.
  • [TPR-03-001-gemini-r2][high] section-03:68 — classify_safety needs parsed frontmatter for collision guard purity. Resolved: Fixed on 2026-04-14. Added frontmatter_data: dict | None = None parameter to classify_safety signature. Pre-parsed at scan time; no I/O inside classifier.
  • [TPR-03-002-gemini-r2][medium] section-03:71 — Paired UNKNOWN_FIELD/MISSING_REQUIRED_FIELD deduplication for plan:→name: rename. Resolved: Fixed on 2026-04-14. Added paired-finding deduplication with resolved_by_sibling field on ClassifiedFinding.
  • [TPR-03-003-gemini-r2][medium] section-05:46 — —quick mode must include Phase 5 for report generation. Resolved: Fixed on 2026-04-14. Updated §05.1: —quick runs Phases 1-3 and 5 (report-only, no auto-fix). Phase 4 skipped.

Round 3 findings (iteration 3, 2026-04-14):

  • [TPR-03-001-codex-r3][high] section-05:68 — Point §05 phase wiring at real plan_corpus entrypoints (python -m scripts.plan_corpus, not the legacy single-file .py path). Resolved: Fixed on 2026-04-14. Updated Phase 1/2 entrypoints to actual package API.
  • [TPR-03-002-codex-r3][high] section-05:113 — Realign §05 validation cases (a)/(g) with live route A/B behavior. Current corpus = MISSING_DEPENDENCY, not BLOCKED. Resolved: Fixed on 2026-04-14. Updated both test cases to expect MISSING_DEPENDENCY (route B), with note about route A migration.
  • [TPR-03-003-codex-r3][medium] section-05:187 — Route roadmap_scan migration through load_and_validate, not low-level split_frontmatter_strict. Resolved: Fixed on 2026-04-14. Updated migration item to use load_and_validate as sole entrypoint per §01 SSOT boundary.
  • [TPR-03-004-codex-r3][medium] section-05:178 — Undefined —check mode; replaced with —full —no-auto-fix. Resolved: Fixed on 2026-04-14. Changed verification step to use existing —full —no-auto-fix mode.
  • [TPR-03-005-codex-r3][medium] section-03:114 — Carry resolved_by_sibling through the report contract. Resolved: Fixed on 2026-04-14. Updated §03.2 JSON spec to include resolved_by_sibling field.

Round 4 findings (iteration 4, 2026-04-15 — close-out dual-source TPR):

  • [TPR-03-001-codex-r4][high] scripts/verify_roadmap/patcher.py:306 — Refuse patch writes that escape the reviewed plan corpus. Resolved: Fixed on 2026-04-15. Added required corpus_root: Path parameter to apply_patch(). Resolves to relative_to() check; refuses PatchResult(applied=False) when path escapes. 3 negative-pin tests added in test_patcher.py::TestApplyPatchPathEscape. Propagated to apply_fixes() + PatcherFn type + all test call sites. Agreement: [TPR-03-001-gemini-r4] (same fix resolves both)
  • [TPR-03-001-gemini-r4][medium] scripts/verify_roadmap/patcher.py:166 — Missing path escape check in concurrent-session guards. Resolved: Fixed on 2026-04-15. Same fix as [TPR-03-001-codex-r4] (agreement). Agreement: [TPR-03-001-codex-r4] (same fix resolves both)
  • [TPR-03-002-codex-r4][medium] scripts/verify_roadmap/safety.py:286 — Implement sibling dedup for plan-to-name rename findings. Resolved: Fixed on 2026-04-15. Created scripts/verify_roadmap/pairing.py (separate from safety.py per 500-line BLOAT rule). pair_resolved_by_sibling() detects UNKNOWN_FIELD(plan) SafeFix rename + MISSING_REQUIRED_FIELD(name) on same PlanIndex file and marks the dependent half with resolved_by_sibling=<rename_id>. Wired into quick.py after classify_safety list comp. 9 tests in test_pairing.py (3 positive + 6 negative pins). Exported from __init__.py. Agreement: [TPR-03-002-gemini-r4] (same pairing concern; both layers addressed)
  • [TPR-03-002-gemini-r4][medium] scripts/verify_roadmap/auto_fix.py:202 — Auto-fix engine does not skip resolved_by_sibling findings. Resolved: Fixed on 2026-04-15. Added if cf.resolved_by_sibling is not None: continue guard in build_fix_plans(). Regression test test_skips_resolved_by_sibling added in test_auto_fix.py::TestBuildFixPlans. Agreement: [TPR-03-002-codex-r4] (same pairing concern; both layers addressed)
  • [TPR-03-003-codex-r4][low] .claude/skills/continue-roadmap/roadmap_scan.py:1482 — Log verify-quick degradation failures without requiring trace mode. Resolved: Fixed on 2026-04-15. Changed exception handler from trace(...) (no-op without --trace) to sys.stderr.write(f"[verify-quick] degradation: ...") (unconditional stderr). Import-failure and banner-ordering integration tests are tracked for test_quick.py but deferred to §04/§05 implementation scope (the integration point is in roadmap_scan.py which is outside §03’s owned modules).

Round 4 iteration 2 findings (2026-04-15 — re-review after round-4 fixes):

  • [TPR-03-001-codex-r4i2][low] scripts/verify_roadmap/pairing.py:98 — Stop using the rationale string as pairing state. Evidence: pair_resolved_by_sibling identified the rename sibling via other.rationale.startswith(_RENAME_RATIONALE_PREFIX). Prose rationale is produced in safety.py:300; a wording edit would silently break pairing. Impact: No structural source of truth for “this is the rename half” — fragile coupling through prose. Resolved: Fixed on 2026-04-15. Added PAIRING_TAG_PLAN_TO_NAME_RENAME constant + pairing_tag: str | None field on ClassifiedFinding. Classifier sets the tag on the rename-case SafeFix. Pairing function matches other.pairing_tag == PAIRING_TAG_PLAN_TO_NAME_RENAME instead of rationale string. All 9 test_pairing.py tests updated to pass the tag. Basis: direct_file_inspection. Confidence: high.

Round 4 iteration 3 findings (2026-04-15 — re-review after pairing_tag fix):

  • [TPR-03-001-codex-r4i3][medium] scripts/verify_roadmap/auto_fix.py:102 — Replace rationale-string dispatch for unknown-field fixes with structural state. Evidence: _dispatch_unknown_field() still chose between REMOVE_KEY and RENAME_KEY by inspecting cf.rationale for prose fragments. Same fragile-coupling pattern just fixed in pairing.py. Resolved: Fixed on 2026-04-15. Replaced rationale-based dispatch with if cf.pairing_tag == PAIRING_TAG_PLAN_TO_NAME_RENAME: structural check. Imported the constant; updated test fixture _safe_fix() to accept pairing_tag. Basis: direct_file_inspection. Confidence: high.
  • [TPR-03-002-codex-r4i3][low] tests/plan-audit/test_pairing.py:77 — Add integration pin for classify_safety emitting the pairing tag. Evidence: Pairing tests hand-construct the tag; safety tests stop at asserting SafeFix. No test exercises the real classifier→pairing handoff. Resolved: Fixed on 2026-04-15. Added test_unknown_field_plan_key_rename_emits_pairing_tag in test_safety.py — calls classify_safety() directly and asserts result.pairing_tag == PAIRING_TAG_PLAN_TO_NAME_RENAME. 289 tests now pass. Basis: direct_file_inspection. Confidence: high.

Round 4 iteration 4 findings (2026-04-15 — structural target_key):

  • [TPR-03-001-codex-r4i4][medium] scripts/verify_roadmap/pairing.py:87 — Carry schema field identity structurally instead of parsing Finding.description. Evidence: Downstream flow still parsed prose to identify which field a finding refers to. pair_resolved_by_sibling() checked "name" in f.description.lower(); _dispatch_unknown_field() checked "plan" in desc; _classify_missing_required_field() checked "reviewed" in desc. Resolved: Fixed on 2026-04-15. Added target_key: str | None = None to Finding dataclass in plan_corpus/types.py. _check_required_fields() and _check_unknown_fields() in schema.py now populate it with the actual key name. All downstream modules (safety.py, auto_fix.py, pairing.py) now dispatch on finding.target_key — zero prose parsing for field identity. Agreement: [TPR-03-001-gemini-r4i4] (same systemic issue, different angle)
  • [TPR-03-001-gemini-r4i4][high] scripts/verify_roadmap/auto_fix.py:99 — Remove prose-string fragility across auto_fix, pairing, and safety modules. Resolved: Fixed on 2026-04-15. Same fix as [TPR-03-001-codex-r4i4] — structural target_key field eliminates all prose-based field dispatch. Agreement: [TPR-03-001-codex-r4i4]
  • [TPR-03-002-gemini-r4i4][high] scripts/verify_roadmap/auto_fix.py:165 — Remove fragile string splitting for dead reference extraction. Evidence: _dispatch_dead_reference parses f.description via rsplit(":", 1)[1].strip() to extract the dead reference value. Comment notes this is “best-effort.” Requires structural value passing from the upstream DAG validator — crosses into plan_corpus/dag.py which constructs the dead-reference findings. Impact: Prose-string fragility; any description format change breaks auto-fix. Resolved: Fixed on 2026-04-15 during §03 close-out. Scope estimate was overstated — the fix was ~30 lines across plan_corpus/types.py (new Finding.target_value: str | None field + id-hash + to_json), plan_corpus/docgen.py (2 DEAD_REFERENCE sites + _find_section_file signature), plan_corpus/dag.py (4 DEAD_REFERENCE construction sites), and verify_roadmap/auto_fix.py (structural read + defense-in-depth panic on missing target_value). 7 matrix regression tests added (test_auto_fix.py::TestBuildFixPlanDeadReference) covering positive pin, cross-plan-dep preservation, embedded-colon value preservation, description-format-change independence, None-panics defense-in-depth, and non-depends_on short-circuit — plus 1 dag-side pin in test_dag_classifiers.py verifying every classify_dead_reference finding carries target_value == evidence[0]. 616 plan-audit tests pass. §05:187 anchor item marked done.

Round 5 findings (2026-04-15 — section close-out dual-source review):

Resolution summary: all 14 round-5 findings (13 actionable + 1 informational) fixed on 2026-04-15 in commit 0bfd9e93 (“fix(verify-roadmap): apply 13 TPR-03 round-5 findings to auto-fix engine”). 15 regression tests added across test_patcher.py, test_auto_fix.py, test_plan_corpus.py, test_dag_precedence.py, and test_safety.py to pin each fix. All 493 plan-audit tests pass.

  • [TPR-03-001-codex][high] scripts/verify_roadmap/patcher.py:170 — Fix block-valued after_key insertion. Evidence: insert_key() inserts immediately after the anchor line matched by ^after_key:. auto_fix.py:128-133 uses after_key="sections" for the third_party_review SafeFix, but sections is normally a block list. Replaying the live function on a normal section frontmatter produced sections: followed by third_party_review: and then the original - id: entries, which split_frontmatter_strict() rejects as invalid YAML. Impact: A routine SafeFix can corrupt section frontmatter instead of normalizing it, which violates the text-patcher safety contract and turns a missing-field cleanup into a broken plan file. Required plan update: Make insert_key() understand block-valued anchors: when after_key owns an indented list or mapping, insert after the whole block rather than after the header line. Add a regression test that inserts third_party_review after a populated sections: list and reparses the result through split_frontmatter_strict(). Basis: fresh_verification. Confidence: high. Agreement: [TPR-03-001-gemini] (both reviewers flagged this — codex cited line 170, gemini cited line 105; same root cause)
  • [TPR-03-001-gemini][high] scripts/verify_roadmap/patcher.py:105insert_key corrupts YAML when after_key has a multiline block value (e.g., sections). Evidence: When after_key="sections", insert_key matches the single line ^sections:.*\n and inserts third_party_review immediately after it. Because sections is a block sequence (list), its indented items (- id: ...) are pushed down and become incorrectly associated with the newly inserted third_party_review key. This produces invalid YAML syntax (mixing mapping and sequence items at the same indentation level). Required plan update: Update insert_key to skip indented lines following the after_key match before inserting the new key, similar to the logic already used in remove_key. Basis: direct_file_inspection. Confidence: high. Agreement: [TPR-03-001-codex] (both reviewers flagged this location/title)
  • [TPR-03-002-codex][high] scripts/plan_corpus/types.py:260 — Hash target_key into Finding.id when present. Evidence: Finding.id hashes only category, subtype, source, source_line, plus conditional source_column and target. It ignores the new target_key field entirely. Two live MISSING_REQUIRED_FIELD findings on the same file with target_key='name' and target_key='full_name' both produced the same id (VR-9e2667). Impact: Distinct schema findings alias each other in reports, pairing, and any downstream bookkeeping keyed by Finding.id. That directly undercuts the structural-field migration because the new discriminator exists but does not stabilize identity. Required plan update: Extend Finding.id to append target_key when non-null, preserving backward compatibility the same way source_column and target are handled. Add regression pins for same-file same-subtype findings that differ only by target_key. Basis: fresh_verification. Confidence: high.
  • [TPR-03-003-codex][high] scripts/verify_roadmap/patcher.py:348 — Close the hash-check-to-replace race window. Evidence: apply_patch() hashes the file once, builds new_bytes, writes a temp file, and then unconditionally calls os.replace() at line 406. There is no second identity check or lock between the preimage comparison and the final replace. A concurrent edit that lands after the hash check but before os.replace() will be silently overwritten. Impact: The documented concurrent-session guarantee is not actually met under a real overlapping write: the patcher can still clobber another session’s newer contents even though the file changed between scan and write. Required plan update: Add a second pre-replace guard or locking/CAS-equivalent around the destination path, then add a race regression test that mutates the file after the initial hash check and verifies refusal rather than overwrite. Basis: fresh_verification. Confidence: high.
  • [TPR-03-002-gemini][high] scripts/verify_roadmap/auto_fix.py:214 — Auto-fix engine self-collides on multiple findings for the same file. Evidence: apply_fixes iterates through classifieds individually. If multiple findings target the same file, the first finding successfully patches the file and updates its content hash on disk. The second finding reads the original preimage from the unmodified preimages dictionary, which now mismatches the file’s new content hash. The second finding fails the concurrent-session guard and surfaces as an unapplied fix. Required plan update: Update the preimages dictionary (or a local tracker) with patch_result.after_hash upon a successful patch, so subsequent fixes for the same file in the same batch use the updated hash and succeed. Basis: direct_file_inspection. Confidence: high.
  • [TPR-03-003-gemini][high] scripts/verify_roadmap/patcher.py:152remove_list_item fails to locate block-style lists if the key line contains an inline comment. Evidence: key_pattern = re.compile(rf"^{re.escape(list_key)}\s*:\s*$") strictly expects nothing but whitespace after the colon. If the frontmatter has depends_on: # comment, the pattern fails to match, in_list never becomes True, and the item is not removed. Required plan update: Relax key_pattern to allow inline comments, e.g., re.compile(rf"^{re.escape(list_key)}\s*:"), ensuring consistency with remove_key. Basis: direct_file_inspection. Confidence: high.
  • [TPR-03-004-codex][medium] scripts/verify_roadmap/patcher.py:311 — Carry the original finding id through patch failures. Evidence: The real patcher hardcodes finding_id = "VR-patch" for every returned PatchResult. A live apply_patch() refusal returned PatchResult(..., finding_id='VR-patch', ...). report.py surfaces PatchResult.finding_id directly, so real unapplied-fix output cannot identify which original finding failed. Impact: When concurrent-modification or malformed-frontmatter refusals happen, the report loses the link back to the originating finding. That makes manual follow-up and audit correlation materially harder. Required plan update: Propagate the real finding id into the patcher boundary, or have apply_fixes() overwrite the returned PatchResult.finding_id with plan.finding_id before reporting. Add an end-to-end test that exercises the real patcher path and asserts the original finding id survives into unapplied_fixes. Basis: fresh_verification. Confidence: high.
  • [TPR-03-005-codex][medium] scripts/verify_roadmap/auto_fix.py:377 — Demote refused SafeFixes back to ExposureReview. Evidence: When PatchResult(applied=False) comes back, apply_fixes() only appends the patch result to unapplied_results. It never emits the plan-required demoted ClassifiedFinding with an updated rationale, and FixApplyResult has no bucket for that state. The current result therefore keeps the original finding only in planned_findings and loses the promised manual-review reclassification. Impact: The tool surfaces a bare refusal record instead of a concrete follow-up finding that reviewers can triage in the normal findings stream. That is weaker than the Section 03 contract for failed SafeFixes. Required plan update: Add a demoted-findings bucket to FixApplyResult or otherwise thread failed SafeFixes back into report generation as ExposureReview findings with the refusal reason appended to the rationale. Extend report tests to assert both the demoted finding and the unapplied_fixes entry are present. Basis: direct_file_inspection. Confidence: high.
  • [TPR-03-006-codex][medium] scripts/verify_roadmap/safety.py:252 — Implement the promised reroute: false SafeFix. Evidence: schema.py:376-383 emits SCHEMA_VIOLATION/CROSS_FIELD_INVARIANT for reroute: false, and the owning Section 03 plan still marks removal of that default-equivalent field as implemented. But classify_safety() routes all remaining schema-violation subtypes to ExposureReview, and auto_fix.py has no handler for removing reroute. Impact: One of the explicitly promised frontmatter normalizations is still manual-only. The close-out status overstates the implemented SafeFix surface. Required plan update: Add a concrete SafeFix rule for the reroute: false invariant on the appropriate schema class, wire it to a REMOVE_KEY operation, and add classifier plus dispatcher tests for that path. Basis: fresh_verification. Confidence: high.
  • [TPR-03-007-codex][medium] scripts/plan_corpus/types.py:270 — Serialize target_key in machine-readable findings. Evidence: The new target_key field exists on Finding, and schema.py now populates it, but Finding.to_json() does not include it. Live render_json() output for a schema finding likewise omitted target_key, so JSON reports and DagReport.to_json() still drop the structural field entirely. Impact: Downstream consumers cannot rely on the structured key identity that Section 03 added to eliminate prose parsing. The report remains less machine-parseable than the migration intends. Required plan update: Add target_key to Finding.to_json(), update any JSON documentation that describes the finding shape, and add report/plan-corpus tests that assert the field survives serialization. Basis: fresh_verification. Confidence: high.
  • [TPR-03-008-codex][medium] scripts/verify_roadmap/report.py:132 — Complete the report metadata and target/type rendering. Evidence: JSON metadata currently contains timestamp, mode, and counts, but no classifier-version information even though Section 03 requires it. The human renderers also omit type/target context: _render_md_finding() only shows source plus description, and render_console() prints the same. A live rendering of a finding with a populated target showed no source -> target context in either markdown or console output. Impact: The report is less actionable for humans and less auditable for tooling than the owning section promises. Reviewers cannot see the target or classifier type at a glance, and JSON metadata does not carry versioning context for downstream consumers. Required plan update: Add classifier-version metadata to JSON output and include category/subtype plus source -> target context in markdown and console renderers. Extend report tests to pin those fields. Basis: fresh_verification. Confidence: high.
  • [TPR-03-004-gemini][medium] scripts/plan_corpus/dag.py:246 — Field dropping in manual Finding reconstruction. Evidence: Functions enrich_resolve_dep_finding, apply_precedence, and apply_source_kind_severity manually reconstruct Finding objects when modifying a single field. They fail to pass target_key=finding.target_key to the constructor, meaning the new structural dispatch field is silently dropped when findings pass through the DAG pipeline. Required plan update: Use dataclasses.replace(finding, ...) to safely copy findings without dropping fields when the Finding schema is extended. Basis: inference. Confidence: unknown.
  • [TPR-03-009-codex][low] scripts/verify_roadmap/auto_fix.py:381 — Record before/after snippets in the auto-fix audit log. Evidence: Section 03.3 promises fixes-applied.json entries with finding id, file path, fix type, before/after snippet, and timestamp. The current audit entries capture operations, hashes, backup path, and timestamp, but they never record any snippet of the text that changed. Impact: The audit log is weaker than specified for forensic review: understanding what changed requires opening the backup or re-running the patch logic instead of reading the log entry itself. Required plan update: Capture bounded before/after frontmatter snippets for each applied fix and add a regression test that asserts the audit log contains them. Basis: direct_file_inspection. Confidence: high.
  • [TPR-03-005-gemini][informational] scripts/plan_corpus/schema.py:124target_key is not populated for nested schema violations. Evidence: When emitting MISSING_REQUIRED_FIELD or UNKNOWN_FIELD for nested fields inside sections lists, target_key is omitted. While safe because auto_fix.py only handles top-level keys currently, it leaves nested fields reliant on prose parsing if auto-fixes are ever extended to them. Required plan update: Consider populating target_key with a path string (e.g., sections[].status) for nested violations to fully eliminate prose parsing in future auto-fix extensions. Basis: inference. Confidence: unknown.

Round 5 iteration 2 findings (2026-04-15 — re-review after round-1 fixes):

  • [TPR-03-001-codex-r5i2][medium] tests/plan-audit/test_safety.py:377 — Exercise the reroute SafeFix through the real producer and dispatcher. Evidence: The new reroute coverage only hand-constructs Finding(..., target_key="reroute") and asserts classify_safety() returns SAFE_FIX. It never proves that real schema validation produces that target_key (scripts/plan_corpus/schema.py:377-385), and it never reaches the newly added dispatcher that must emit REMOVE_KEY("reroute") (scripts/verify_roadmap/auto_fix.py:145-160). If _validate_plan_index() stopped populating target_key or _dispatch_cross_field_invariant() regressed to return (), the current 493-test suite would still pass. Impact: TPR-03-006-codex is not actually pinned end-to-end. The promised reroute: false auto-fix can regress in the real validate -> classify -> build-fix path without any failing regression test. Required plan update: Add an end-to-end regression test that starts from real plan-index frontmatter containing reroute: false, asserts the emitted finding carries target_key="reroute", then runs classify_safety() and build_fix_plan() (or dry-run apply_fixes) and checks that the planned operation is REMOVE_KEY for reroute. Basis: fresh_verification. Confidence: high. Resolved: Fixed on 2026-04-15. Added test_reroute_false_end_to_end_validate_classify_dispatch in test_safety.py — starts from real plan-index frontmatter with reroute: false, asserts _validate_plan_index emits target_key="reroute", then runs classify_safety()SAFE_FIX and build_fix_plan()REMOVE_KEY('reroute'). All three layers (producer/classifier/dispatcher) now pinned end-to-end.
  • [TPR-03-002-codex-r5i2][low] tests/plan-audit/test_report.py:152 — Pin classifier_version and target/type rendering in the report suite. Evidence: scripts/verify_roadmap/report.py now adds metadata.classifier_version (report.py:36-41,138-144), markdown Type / Reference / Target key lines (report.py:237-267), and console [category/subtype] plus source -> target output (report.py:290-331). But the report tests at tests/plan-audit/test_report.py:152-230 only assert timestamp, mode, corpus size, safety metadata, and unapplied-fix fields, and the markdown/console tests only check ordering and prefix presence. No assertion mentions classifier_version, category/subtype tags, target rendering, or target-key rendering. A revert of TPR-03-008-codex would therefore keep the suite green. Impact: The close-out claims that the report contract is pinned, but the new user-facing and machine-facing surfaces can drift silently. This weakens confidence in both the JSON contract and the human renderers. Required plan update: Extend tests/plan-audit/test_report.py with explicit JSON assertions for metadata.classifier_version and a finding carrying target/target_key, plus markdown and console assertions that category/subtype and source -> target rendering appear for a targeted finding. Basis: fresh_verification. Confidence: high. Resolved: Fixed on 2026-04-15. Added 5 pins in test_report.py: test_metadata_includes_classifier_version, test_json_finding_entry_serializes_target_key, test_markdown_renders_category_subtype_type, test_markdown_renders_source_to_target_reference, test_console_shows_category_subtype_tag, test_console_shows_source_to_target. A revert of TPR-03-008-codex would now fail the suite.
  • [TPR-03-001-gemini-r5i2][informational] scripts/verify_roadmap/patcher.py:1 — All 13 findings from TPR-03 round 1 verified as resolved. Evidence: Comprehensive verification pass confirmed: insert_key block-list skip; remove_list_item inline-comment regex; apply_patch CAS re-read; auto_fix working_preimages rollforward; Finding.id target_key hashing; to_json target_key serialization; demoted ExposureReview + finding_id propagation; dataclasses.replace in dag.py; structural reroute SafeFix across DAG/validator/taxonomy layers. Resolved: Verification-only informational finding — no code change required. Acknowledges the round-1 fix commit 0bfd9e93 as sound.

Round 5 iteration 3 findings (2026-04-15 — CLEAN PASS, re-review after round-2 fixes):

Dual-source verification of commit 3bccddd8. ZERO actionable findings, 3 informational acknowledgments. Both reviewers (codex + gemini) independently confirm the round-2 test-coverage fixes pin the reroute SafeFix chain and new report.py surfaces correctly. Thoroughness: both reviewers compliant (codex 12 files / 4 rules / tests rerun; gemini 7 files / 4 rules / tests rerun). TPR loop terminates clean.

  • [TPR-03-001-codex-r5i3][informational] tests/plan-audit/test_safety.py:394 — Keep the reroute producer/classifier/dispatcher chain pinned. Resolved: Verification-only. Acknowledges commit 3bccddd8 successfully pins all three layers end-to-end.
  • [TPR-03-002-codex-r5i3][informational] tests/plan-audit/test_report.py:171 — Keep the new report metadata and rendering surfaces pinned. Resolved: Verification-only. Acknowledges commit 3bccddd8 added 6 pins covering classifier_version, target_key JSON serialization, category/subtype markdown + console rendering, and source -> target rendering.
  • [TPR-03-001-gemini-r5i3][informational] tests/plan-audit/test_safety.py:394 — Acknowledge that commit 3bccddd8 successfully pins reroute auto-fix and report surfaces. Resolved: Verification-only. Gemini confirms the round-2 fixes are sound.

Round 6 findings (2026-04-15 — §03.N close-out TPR after commit baa01833):

Dual-source review of the structural Finding.target_value landing. Transport had infra friction (4 consecutive gemini capacity errors before attempt 5 succeeded) — semantic iteration budget untouched. Codex: 436s / 144 events / 6017-byte envelope. Gemini: 313s / 69 events / 2631-byte envelope, ran pytest as fresh_verification. 3 actionable findings, all [medium], all DRIFT:missing-regression-pin against the new target_value field. No agreements tagged by the merger, but TPR-03-002-codex-r6 and TPR-03-001-gemini-r6 target identical location and identical root cause — effectively an agreement.

  • [TPR-03-001-codex-r6][medium] tests/plan-audit/test_plan_corpus.py:488 — Add target_value sync-point regression pins. [DRIFT] Evidence: scripts/plan_corpus/types.py:296-340 now conditionally hashes and serializes target_value, parallel to target_key. But test_plan_corpus.py:488-564 pins only the four target_key cells (discriminator, legacy-None id preservation, to_json serialization, to_json null-for-legacy). No test would fail if target_value were removed from Finding.id or Finding.to_json(). Impact: The target_key/target_value symmetry rule in the rules brief is unpinned at the shared-type sync points. A future regression could silently drop target_value from id-discrimination or JSON output, reintroducing duplicate IDs for same-line dead-list findings or breaking machine-readable report consumers. Resolved: Fixed on 2026-04-15 during §03.N close-out round-6 loop. Added 4 sync-point pins in TestFindingTypeSafety parallel to the target_key quartet: test_id_discriminates_by_target_value (two DEAD_REFERENCE findings on the same line with different target_value values produce distinct ids), test_id_unchanged_when_target_value_is_none (legacy pre-extension hash preserved), test_to_json_serializes_target_value (JSON output includes the field), test_to_json_target_value_null_for_legacy_findings (legacy findings serialize as null, not absent).
  • [TPR-03-002-codex-r6][medium] tests/plan-audit/test_dag_classifiers.py:418 — Expand producer-side target_value matrix coverage. [GAP] Evidence: test_dead_ref_findings_carry_structural_target_value exercises only the prose-reference path (dag.py:1580PLAN_DIRECTORY_NOT_FOUND for truly nonexistent targets). It does NOT hit dag.py:876 (EXPLICIT_DEPENDS_ON SECTION_FILE_NOT_FOUND), dag.py:1532 (stale-annotation special-home), dag.py:1555 (stale-annotation regular slug), or any of the resolve_dep / _find_section_file producers in docgen.py:46-93. No negative pin proves non-DEAD_REFERENCE findings have target_value=None. Impact: Most of the producer surface feeding _dispatch_dead_reference structurally is unpinned. A future omission at one of those construction sites regresses real corpus findings to missing target_value while the hand-constructed auto_fix matrix continues to pass. Resolved: Fixed on 2026-04-15 during §03.N close-out round-6 loop. Added 4 new matrix cells in TestClassifyDeadReference: test_explicit_depends_on_dead_ref_carries_target_value (dag.py EXPLICIT_DEPENDS_ON path via depends_on=["99"] pointing at nonexistent section; pins target_value == dep_id), test_completed_plan_body_ref_carries_target_value (dag.py stale-annotation regular-slug path via <!-- unblocks:archived-plan/02 --> pointing at plans/completed/archived-plan/), test_cross_plan_unknown_name_carries_target_value (docgen.py CROSS_PLAN_NAME_NOT_FOUND producer via depends_on=["ghost-plan#03"]), and the negative pin test_non_dead_reference_finding_has_no_target_value (asserts no classifier accidentally sets target_value on non-DEAD_REFERENCE findings). 624/624 plan-audit tests pass.
  • [TPR-03-001-gemini-r6][medium] tests/plan-audit/test_dag_classifiers.py:418 — GAP: Missing matrix cells and negative pin for dag-side target_value population. [GAP] Evidence: Same root issue as TPR-03-002-codex-r6 — the test_dead_ref_findings_carry_structural_target_value pin covers only the body-prose PLAN_DIRECTORY_NOT_FOUND path. Other three dag.py construction sites (EXPLICIT_DEPENDS_ON, stale-annotation special-home, stale-annotation regular-slug) unpinned. No negative pin for non-DEAD_REFERENCE subtypes. Impact: Silent regression if target_value population is dropped at one of the unpinned sites — would surface later as AutoFixError panic in the dispatcher rather than a targeted test failure. Resolved: Fixed on 2026-04-15. Same fix as [TPR-03-002-codex-r6] (effective agreement — different titles, same location, same root cause). Agreement (effective): [TPR-03-002-codex-r6] — same location, same root cause, different title (merger did not auto-detect agreement due to title mismatch; manually cross-referenced).

03.N Completion Checklist

  • SafetyClass enum (SafeFix | ExposureReview), ClassifiedFinding wrapper, PreimageRecord guard, and classify_safety(finding, context) defined and tested — all OWNED here (not in plan_corpus)
  • WriteBackContext carries git signals; --quick mode bypasses its construction entirely
  • classify_safety is pure (no I/O); git signal population lives at the CLI edge; plan_corpus grep-verified to contain no subprocess or git calls
  • plan: -> name: rename guarded against collision (both keys present with different values -> ExposureReview)
  • reviewed: false insertion differentiated by schema class (PlanSection/RoadmapSection: SafeFix; PlanIndex: ExposureReview per workflow gate)
  • FM_DECLARED_VS_BODY_DERIVED is ALWAYS ExposureReview — defense-in-depth assert in auto-fix engine
  • Findings report format defined and implemented (JSON + markdown + console) — imports Finding / FindingCategory / FindingSubtype from plan_corpus, no shadow types
  • Frontmatter text patcher is the ONLY write path — PyYAML never used for output; comments, key ordering, and formatting preserved
  • Concurrent-session guards: preimage hash check, atomic write via os.replace, refuse-on-conflict
  • Dead-reference audit trail in fixes-applied.json only — NO inline HTML comments (re-scanning hazard)
  • Auto-fix engine applies only SafeFix findings; hard-asserts rejection of ExposureReview findings
  • Manual-review flagging for CONFLICT, SUPERSEDED, BLOCKED, MISSING_DEPENDENCY (intrinsically manual) + any ExposureReview-classified finding
  • Safe-fix guards: backups, logging, --dry-run, --no-auto-fix
  • Integration with /continue-roadmap via --quick mode pre-check (BLOCKED + DEAD_REFERENCE only; no CONFLICT)
  • --quick vs --full scope explicitly documented — no ambiguity on which classifiers run in each mode
  • roadmap_scan.py shadow parser migration mandated as Option A in Section 05.3 — Option B rejected per TPR
  • timeout 150 ./test-all.sh green — no regressions Verified 2026-04-15: Rust unit tests (7724), ori_rt (367), ori_llvm (633), AOT integration (2159), Ori spec interpreter (4444) all pass with 0 failures. Ori spec (LLVM backend) crashes with signatures matching already-tracked out-of-scope bugs (BUG-04-030 stack overflow, BUG-04-039 join trampoline, BUG-04-074 []+push type var) — these live in the codegen subsystem, have their own in-progress fix sections with their own TPR/hygiene gates, and do NOT touch any code §03 introduced (§03 is pure Python plan-tooling: scripts/plan_corpus/, .claude/skills/verify-roadmap/). Per /continue-roadmap Step 2.5 scope-validation, cross-subsystem blockers do not propagate to §03 close-out. No regressions attributable to §03 work.
  • /tpr-review — dual-source review of report format, auto-fix logic, text patcher safety, concurrent-session guards Completed 2026-04-15 across rounds 1-6. Round 6 (iter 1) ran against commit baa01833 — 3 actionable medium-severity findings (all DRIFT:missing-regression-pin on target_value), fixed in commit 809e6cf5 with 8 new plan-audit pins. Round 6 iter-2 re-review was launched but aborted by user at their discretion; iteration-1 fix accepted as terminal. Cumulative envelope evidence: all earlier rounds (1-5) achieved CLEAN PASS. See §03.R for full finding-by-finding history.
  • /impl-hygiene-review — verify auto-fix safety (no semantic changes), report completeness, no shadow parsers introduced Resolved 2026-04-15: User override — rigor gates waived for §03.N close-out. Code-level hygiene evidence already captured: structural target_value plumbing passed 6 rounds of dual-source TPR; 14 regression pins in place (target_key + target_value symmetry at every sync point); 624/624 plan-audit tests green. No shadow parsers introduced — all frontmatter reads go through plan_corpus.load_and_validate.
  • /improve-tooling section-close sweep — verify per-subsection retrospectives ran; add cross-subsection findings Resolved 2026-04-15: User override — rigor gate waived. No cross-subsection tooling gaps surfaced during §03 work; per-subsection retrospectives (§03.1 through §03.5) each documented “no gaps” at their original close-out.
  • /sync-claude section-close sweep — verify CLAUDE.md and rules reflect new verify-roadmap modes and integration points Resolved 2026-04-15: User override — rigor gate waived. CLAUDE.md §Commands already documents python -m scripts.plan_corpus check / docgen and the plan-complete.py / rules-for-review.py scripts; .claude/rules/intelligence.md unaffected by §03 work. Any residual drift is low-severity doc-sync that can be handled opportunistically.