05 Code Graph: Parser Adapters
05.0 Goal
Set up the tree-sitter parsing infrastructure that all code graph work depends on. This section delivers three things: (1) reliable grammar loading for all 9 supported languages, (2) a formal adapter API contract that Section 06 consumes, and (3) query file families (not just tags.scm) that prepare for relationship extraction. The section does NOT extract symbols or import into Neo4j — it ensures every repo can be parsed and that the adapter layer exposes everything downstream sections need.
Success Criteria:
- All 9 tree-sitter grammars load successfully with pinned, compatible versions (8 PyPI + 1 source-built Koka with scanner patch)
- Parser adapter API contract exposes
repo_id,language_id,relative_path,source_bytes,byte_count,tree,had_error,error_node_count,query_handles,coverage_status,content_hash - Query file families (
decls.scm,calls.scm,imports.scm,impls.scm) exist for every supported language (32 files across 8 languages + lean symlink to cpp) - Matrix validation: Language x (Valid/Malformed/Empty) x query family all pass (108/108)
- Full parse of all reference repos completes in <60 seconds (28.1s actual)
- Unblocks mission criteria: “tree-sitter parses all 9 supported languages” (parsing half — extraction is Section 06’s deliverable)
Context: Section 06 (Symbol Extraction) needs more than just parse trees — it needs compiled query handles for declarations, calls, imports, and implementations. If Section 05 only delivers tags.scm parsing, Section 06 must reinvent query infrastructure. This section front-loads that work.
Reference implementations:
- Sourcegraph SCIP: Multi-language indexing with per-language adapter pattern
- nvim-treesitter: Query file organization (
queries/{lang}/{tags,highlights,locals}.scm)
Depends on: None (independent pillar start).
05.1 Python Dependencies & Version Compatibility
File(s): ~/projects/lang_intelligence/.venv/, ~/projects/lang_intelligence/requirements.txt
Grammar packages pin different tree-sitter core versions. A blanket pip install tree-sitter>=0.25.0 will fail or produce ABI mismatches. The correct approach: pin exact versions after a compatibility smoke test.
Modern tree-sitter Python API (0.22+): The build_library() / Language() pattern from pre-0.22 is deprecated. In tree-sitter 0.22+, grammar packages expose a language() function directly:
# Modern API (tree-sitter >= 0.22)
import tree_sitter_rust
from tree_sitter import Language, Parser
RUST = Language(tree_sitter_rust.language())
parser = Parser(RUST)
tree = parser.parse(source_bytes)
There is NO shared library building step. Grammar packages are Python modules with compiled bindings.
-
Create
requirements.txtwith exact pinned versions. Start with latest compatible set and run smoke test:tree-sitter==0.25.2 tree-sitter-rust==0.24.2 tree-sitter-go==0.25.0 tree-sitter-zig==1.1.2 tree-sitter-typescript==0.23.2 tree-sitter-haskell==0.23.1 tree-sitter-swift==0.0.1 tree-sitter-cpp==0.23.4Version selection rule: Used latest available from PyPI (2026-04-13). Core 0.25.2 compatible with all grammar packages.
-
Run compatibility smoke test: for each grammar package,
Language(mod.language())must succeed, andParser(lang).parse(b"")must return a tree without segfault -
Record the compatibility matrix in
requirements.txtcomments:# Compatibility matrix (verified 2026-04-13): # tree-sitter-rust 0.24.2 + core 0.25.2: OK # tree-sitter-go 0.25.0 + core 0.25.2: OK # tree-sitter-zig 1.1.2 + core 0.25.2: OK # tree-sitter-typescript 0.23.2 + core 0.25.2: OK # tree-sitter-haskell 0.23.1 + core 0.25.2: OK # tree-sitter-swift 0.0.1 + core 0.25.2: OK # tree-sitter-cpp 0.23.4 + core 0.25.2: OK # tree-sitter-koka 0.1.0 (source, scanner patch) + core 0.25.2: OK -
Swift grammar:
tree-sitter-swift==0.0.1from PyPI loads successfully with core 0.25.2. No source build needed. -
Koka grammar: NOT on PyPI. Cloned
koka-community/tree-sitter-kokaand installed from source. Required patchingsetup.pyto includesrc/scanner.c(upstream bug: external scanner not listed in ext_modules sources). Requirespython3-devheaders. Loads successfully after patch. -
Verify all grammars load: created
scripts/validate-parsers.pywith--smokemode. 8/8 grammars pass. Skipscoverage_status: customentries (Ori). -
Document: Lean
.leanfiles have 86% parse error rate. Lean4 repo is parsed via C++ grammar for runtime code only. Coverage status:partial(not “unsupported” — some of the repo IS parseable via C++ grammar). Documented inlanguages.yaml(created in 05.2). -
Document: Ori uses its own Rust parser (no tree-sitter grammar). Ori adapter is implemented in Section 09.3. Ori appears in
languages.yamlwithgrammar: nativeandcoverage_status: custom.validate-parsers.pyskipscoverage_status: customentries. -
Create
scripts/setup-parsers.shthat automates: venv creation,pip install -r requirements.txt, Koka source build (with scanner patch),validate-parsers.py --smokerun. Supports--verboseand--skip-kokaflags. -
Subsection close-out (05.1)
- All tasks above are
[x]and all grammars load via smoke test - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 05.1: Koka scanner patch was the main friction point (upstream bug in setup.py).setup-parsers.shalready has--verboseflag. No additional tooling gaps —validate-parsers.py --smokegives clear per-grammar pass/fail. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files. Clean (2026-04-13).
- All tasks above are
05.2 Language Adapter Manifests
File(s): ~/projects/lang_intelligence/languages.yaml, ~/projects/lang_intelligence/repos.yaml
These manifests are the single source of truth for the entire code graph pipeline. Every downstream script (Section 06 extraction, Section 07 import, Section 09 sync) reads them. Getting the schema right here prevents cascading fixes later.
languages.yaml — per-language capabilities:
rust:
grammar: tree-sitter-rust # pip package name or "source" or "native"
grammar_version: "0.23.3" # pinned version (must match requirements.txt)
extensions: [".rs"]
query_families: # which .scm query files exist for this language
- decls # declarations (functions, types, traits, etc.)
- calls # call sites
- imports # use/import statements
- impls # impl/instance/conformance blocks
coverage_status: full # full | partial | custom
maturity: stable
expected_error_rate: 0.09
notes: ""
# Ori — native parser, not tree-sitter
ori:
grammar: native
extensions: [".ori"]
query_families: [] # N/A — uses Ori's own Rust parser via FFI
coverage_status: custom
maturity: stable
expected_error_rate: 0.0
notes: "Parsed by ori_parse (Rust). Adapter in Section 09."
lean:
grammar: tree-sitter-cpp # .lean files skipped; only C++ runtime parsed
extensions: [".cpp", ".h"] # NOT .lean
query_families: [decls, calls, imports, impls] # all C++ query families
coverage_status: partial
maturity: stable
expected_error_rate: 0.02
notes: ".lean files have 86% error rate — skipped. C++ runtime code only."
repos.yaml — per-repo source mapping:
The local corpus has both go/ (issue tracker only) and golang/ (source code). The manifest MUST canonicalize these:
go:
repo_id: go # canonical ID used in Neo4j
source_root: ${REFERENCE_REPOS_ROOT}/golang # resolved at runtime by adapter
issue_root: ${REFERENCE_REPOS_ROOT}/go # resolved at runtime by adapter
languages: [go]
include:
- cmd/compile/
- go/types/
- internal/types/
exclude:
- test/
- vendor/
-
Create
languages.yamlwith all 10 language configs (9 tree-sitter + Ori native), including all required fields. TypeScript has custommodule_name/language_funcfields for itslanguage_typescript()API. -
Create
repos.yamlwith curated include/exclude roots for all 11 repos. All paths use${REFERENCE_REPOS_ROOT},${ORI_LANG_ROOT},${LANG_INTELLIGENCE_ROOT}env-var patterns. -
Canonicalize the
go/golangduality:repo_id: go,source_root: ${REFERENCE_REPOS_ROOT}/golang,issue_root: ${REFERENCE_REPOS_ROOT}/go -
For mixed-language repos, all applicable languages listed:
- Gleam:
[rust], Roc:[rust], Elm:[haskell], Koka:[haskell, koka], Lean4:[cpp], Swift:[swift, cpp]
- Gleam:
-
Validate: every
languages:entry inrepos.yamlreferences a valid key inlanguages.yaml(13/13 refs OK) -
Validate: every
source_rootpath exists on disk (11/11 OK); everyissue_rootpath exists (11/11 OK) -
Subsection close-out (05.2)
- All tasks above are
[x]and both manifests validate - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 05.2: Validation is currently inline Python. Built into parser_adapter.py in 05.3 (resolve_repo_path + manifest loading). No separate validate-manifests.py needed —validate-parsers.py --smokealready covers grammar loading, and the inline validation covered path resolution. No tooling gaps. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files. Clean (2026-04-13).
- All tasks above are
05.3 Parser Adapter API Contract
File(s): ~/projects/lang_intelligence/neo4j/parser_adapter.py
The parser adapter is the boundary between “raw tree-sitter” and “everything downstream.” Section 06 (extraction), Section 07 (import), and Section 09 (sync) all consume this API. The contract must be explicit, typed, and documented.
Adapter output contract (per file):
@dataclass
class ParseResult:
repo_id: str # canonical repo identifier from repos.yaml
language_id: str # language key from languages.yaml
relative_path: str # path relative to source_root
source_bytes: bytes # raw file content (needed by Section 06 for qualified names, signature_hash)
byte_count: int # len(source_bytes)
tree: Tree | None # tree-sitter Tree (None on load failure)
had_error: bool # True if tree contains ERROR nodes
error_node_count: int # count of ERROR nodes in tree
query_handles: dict[str, Query] # compiled queries by family name
coverage_status: str # "full" | "partial" | "custom"
content_hash: str # SHA-256 of file content (for incremental sync)
class CoverageStatus(Enum):
FULL = "full" # grammar parses this language well
PARTIAL = "partial" # grammar has known gaps (e.g., Lean C++ only)
CUSTOM = "custom" # not tree-sitter (e.g., Ori native parser)
Error handling policy:
-
Per-file parse failures (I/O error, encoding error): soft — skip file, log warning, continue. A single bad file must NOT abort the pipeline.
-
Grammar load failures (missing package, ABI mismatch): hard — abort immediately with clear error message. A broken grammar affects ALL files for that language.
-
Query compilation failures (malformed
.scmfile): hard — abort immediately. A broken query produces wrong extraction results silently. -
Implement
ParseResultdataclass with all fields listed above -
Implement
CoverageStatusenum -
Implement
parse_file(repo_config, lang_config, file_path) -> ParseResultthat:- Loads grammar from
languages.yamlconfig (with caching) - Reads file bytes (soft-fail on I/O/encoding errors — returns None)
- Parses with tree-sitter
- Counts ERROR nodes recursively
- Compiles and attaches query handles for all query families listed in
languages.yaml(with caching) - Computes SHA-256 content hash (for Section 09 incremental sync)
- Loads grammar from
-
Implement
parse_repo(repo_id) -> Iterator[ParseResult]that:- Reads
repos.yamlfor include/exclude patterns - Walks the file tree, filtering by extensions from
languages.yaml - Calls
parse_filefor each matching file - Logs per-file soft failures without aborting
- Skips
coverage_status: customlanguages (native parsers)
- Reads
-
Implement hard error handling: grammar load failures raise
RuntimeErrorwith module/func context; query compilation failures raiseRuntimeErrorwith.scmpath. Per-file I/O errors return None with warning log. -
--parallelflag: removed unused parameter fromparse_repo— sequential parsing meets the <60s target (28.1s actual). ProcessPoolExecutor can be added tovalidate-parsers.py --fullif needed in the future. -
Implement
resolve_repo_path(template)that expands${REFERENCE_REPOS_ROOT},${LANG_INTELLIGENCE_ROOT}, and${ORI_LANG_ROOT}env vars via regex substitution. Checks env vars first, falls back to defaults. Also exposedload_manifests()for downstream consumers. -
Verify adapter output: smoke test on Gleam repo (188 files, 0 errors). All ParseResult fields populated. Content hash deterministic (SHA-256, same file = same hash on re-parse).
-
TPR checkpoint —
/tpr-reviewcovering 05.1–05.3 implementation work (superseded by full-section TPR in 05.N — iter-4 clean pass on 2026-04-12) -
Subsection close-out (05.3)
- All tasks above are
[x]and adapter API is documented and tested - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 05.3: Query file warning spam was noisy during testing (expected — no query files yet). Addedloggingproperly so downstream consumers control verbosity. Grammar cache and query cache prevent redundant loads. No tooling gaps. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files. Clean (2026-04-13).
- All tasks above are
05.4 Query File Families
Files: ~/projects/lang_intelligence/queries/{lang}/{family}.scm
Official tags.scm files vary by language in what they capture. Some (Rust, Go) already include @reference.call and @reference.implementation captures alongside declarations. Others (TypeScript, Swift) primarily capture declarations. This subsection standardizes query file families for ALL languages, adapting existing upstream queries where possible and writing custom ones where needed.
Query families:
decls.scm— declarations: functions, types, traits, methods, constantscalls.scm— call expressions: function calls, method callsimports.scm— import/use/require statementsimpls.scm— impl blocks, interface conformance, instance declarations
Per-language query file status:
| Language | decls.scm | calls.scm | imports.scm | impls.scm | Source |
|---|---|---|---|---|---|
| Rust | Official (has decls) | Official (has @reference.call) | Custom | Official (has @reference.implementation) | tree-sitter-rust |
| Go | Official (has decls) | Official (has @reference.call) | Official (has package/import) | N/A (implicit) | tree-sitter-go |
| Zig | Custom (no official tags) | Custom | Custom | N/A | tree-sitter-zig |
| TypeScript | Official tags.scm adapted | Custom | Custom | Custom | tree-sitter-typescript |
| Haskell | Custom (no official tags) | Custom | Custom | Custom | tree-sitter-haskell |
| Swift | Official tags.scm adapted | Custom | Custom | Custom | tree-sitter-swift |
| C++ | Official tags.scm adapted | Custom | Custom | N/A | tree-sitter-cpp |
| Koka | Custom (if grammar works) | Custom | Custom | Custom | tree-sitter-koka |
Implementation approach:
- For languages WITH official
tags.scm: adapt/rename todecls.scm, then writecalls.scm,imports.scm,impls.scmfrom scratch using each grammar’snode-types.jsonas reference. - For languages WITHOUT official
tags.scm(Zig, Haskell, Koka): write all four families from scratch. - Some families may be empty stubs for some languages (e.g., Go has no explicit
implblocks —impls.scmis empty). Empty stubs are valid — they return zero captures. The adapter contract handles this gracefully.
-
Rust (
queries/rust/):decls.scm(function_item, struct_item, enum_item, type_item, trait_item, const_item, static_item, mod_item, macro_definition),calls.scm(call_expression, macro_invocation),imports.scm(use_declaration),impls.scm(impl_item). 98 decls / 1356 calls / 12 imports / 2 impls on analyse.rs. -
Go (
queries/go/):decls.scm(function_declaration, method_declaration, type_declaration, const_declaration, var_declaration),calls.scm(call_expression),imports.scm(import_declaration, package_clause).impls.scmis empty stub. 60 decls / 28 calls / 2 imports. -
Zig (
queries/zig/): All four from scratch.decls.scm(function_declaration, variable_declaration, container_field),calls.scm(call_expression + field_expression),imports.scm(builtin_function @import).impls.scmempty stub. 1378 decls / 4664 calls / 93 imports. -
TypeScript (
queries/typescript/):decls.scm(function_declaration, class_declaration, interface_declaration, type_alias_declaration, enum_declaration, method_definition),calls.scm(call_expression, new_expression),imports.scm(import_statement, export with source),impls.scm(implements_clause). 822 decls / 2076 calls / 4 imports. -
Haskell (
queries/haskell/): All from scratch.decls.scm(function, signature, data_type, newtype, type_synomym),calls.scm(apply + variable),imports.scm(import + module),impls.scm(instance). 20 decls / 78 calls / 34 imports. -
Swift (
queries/swift/):decls.scm(function_declaration, class_declaration, protocol_declaration, typealias_declaration, property_declaration — note: tree-sitter-swift 0.0.1 lacks struct_declaration and enum_declaration),calls.scm(call_expression),imports.scm(import_declaration),impls.scm(inheritance_specifier). 38 decls / 32 calls / 8 imports / 6 impls. -
C++ (
queries/cpp/):decls.scm(function_definition, class_specifier, struct_specifier, enum_specifier, namespace_definition with namespace_identifier),calls.scm(call_expression),imports.scm(preproc_include).impls.scmempty stub. 66 decls / 214 calls / 10 imports. -
Koka (
queries/koka/): Grammar loaded (with scanner patch from 05.1).decls.scm(fundecl, puredecl, typedecl),calls.scm(opexpr/atom/name/qidentifier),imports.scm(import + modulepath),impls.scmempty stub. 24 decls / 212 calls. Koka grammar works for.kkfiles. -
Test each query file against at least one real file from its repo. 29/32 produce captures; 3 WARN are test-file selection (files lacking instances/imports — not query bugs). All 32 queries compile. 4 declared stubs return zero captures.
-
Create golden file probes:
tests/golden-probes.yamlwith 8 probes (one per language), recording expected capture counts per query family with 10% tolerance. -
Subsection close-out (05.4)
- All tasks above are
[x]and all query files compile (non-stubs produce captures, declared stubs return zero captures) - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 05.4: Node type discovery was the main friction (had to iterate multiple times fixing “Impossible pattern” errors). The inline test script used for validation should be formalized intovalidate-parsers.py --matrixin 05.5. Key lesson: always check named node types from the grammar BEFORE writing queries (theLanguage.node_kind_for_id()API). tree-sitter-swift 0.0.1 lacks struct_declaration/enum_declaration — noted in query file comments. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files. Clean (2026-04-13).
- All tasks above are
05.5 Parse Validation & Matrix Testing
File(s): ~/projects/lang_intelligence/scripts/validate-parsers.py
A comprehensive validation script that tests the full parser adapter stack: grammar loading, file parsing, query compilation, and capture accuracy.
Matrix dimensions:
- Language (9 tree-sitter languages)
- File condition (Valid source / Malformed source / Empty file)
- Query family (decls / calls / imports / impls)
-
Implement
validate-parsers.pywith all four test modes:--smoke: 8/8 grammars, <0.02s (target <5s)--matrix: 108/108 cells pass, 0.53s (target <30s). Tests 9 languages x 3 conditions x 4 families.--full: 5581 files, 131.7 MB, 28.1s (target <60s). Reports per-repo: files, MB, errors, rate, throughput.--golden: 8 probes pass, 0.26s. All capture counts within 10% tolerance.
-
Malformed file handling: Created
tests/malformed/with deliberately broken files. Matrix test verifies parser produces tree with ERROR nodes (no crash).--matrixincludes valid/malformed/empty conditions. -
Empty file handling:
--matrixtests empty files for all languages. Parser produces empty tree, no crash. -
Error rate validation:
--fullreports per-repo error rates. Key rates: Gleam 0%, Rust 5.9%, Go 4.5%, Swift 28.7% (0.0.1 grammar), Lean 73.4% (C++ grammar for runtime only). -
Query compilation validation:
--matrixcompiles all.scmfiles per language x family. Reports stubs separately. All non-stubs produce captures on valid source. -
Performance reporting:
--fullreports files/sec per repo. Aggregate: ~200 files/sec, 28.1s total. Zig slowest (23 files/s, large files), Gleam fastest (660 files/s, small files). -
Golden file probes: 8 probes in
tests/golden-probes.yaml(one per language). Baseline captured 2026-04-13. 10% tolerance for grammar version drift. -
Incremental hashing verification: SHA-256 content hash is deterministic (verified in 05.3 smoke test — same file = same hash on re-parse).
-
Grammar update policy: Documented in
validate-parsers.py --help(module docstring). Run--goldenbefore/after grammar bump. CI gate:--matrixon every version bump. -
Subsection close-out (05.5)
- All tasks above are
[x]andvalidate-parsers.py --matrixpasses (108/108) - Update this subsection’s
statusin section frontmatter tocomplete - Run
/improve-toolingretrospectively on THIS subsection — Retrospective 05.5: Script output is clear for human consumption. JSON output for CI could be useful but not blocking (no CI pipeline yet).BLESS=1for golden probes is a nice-to-have for 05.N. No urgent tooling gaps. - Repo hygiene check — run
diagnostics/repo-hygiene.sh --checkand clean any detected temp files. Clean (2026-04-13).
- All tasks above are
05.R Third Party Review Findings
-
[TPR-05-001-codex][high]plans/lang-intelligence/section-06-symbol-extraction.md:53— Close the LEAK between ParseResult and Section 06. Evidence: Section 06.2 reads repos.yaml/languages.yaml/tags.scm directly instead of consuming the ParseResult adapter. Resolved: Fixed on 2026-04-12. Added plan-sync item to 05.N requiring Section 06 update to consume adapter. -
[TPR-05-002-codex][high]plans/lang-intelligence/section-05-parser-adapters.md:176— Remove LEAK of machine-local roots from repos.yaml. Evidence: repos.yaml hardcoded ~/projects/… paths. Changed to ${REFERENCE_REPOS_ROOT} env-var pattern with runtime resolver. Resolved: Fixed on 2026-04-12. Changed repos.yaml contract to env-var pattern, added resolve_repo_path() item to 05.3. -
[TPR-05-003-codex][high]plans/lang-intelligence/section-05-parser-adapters.md:237— Fix GAP in ParseResult source payloads. Evidence: ParseResult had byte_count but not source_bytes — Section 06 needs source slices for qualified names. Resolved: Fixed on 2026-04-12. Added source_bytes field to ParseResult contract. -
[TPR-05-004-codex][medium]plans/lang-intelligence/section-05-parser-adapters.md:292— Eliminate WASTE from tags.scm baseline. Evidence: Rust/Go official tags.scm already include call/impl references. Blanket “declarations only” claim was inaccurate. Resolved: Fixed on 2026-04-12. Nuanced per-language, updated table and implementation notes. -
[TPR-05-005-codex][medium]plans/lang-intelligence/00-overview.md:135— Resolve DRIFT in overview language matrix. Evidence: Overview said Swift=source build, Lean=tree-sitter-lean. Section 05 says Swift=try PyPI first, Lean=C++ only. Resolved: Fixed on 2026-04-12. Synced overview matrix with Section 05 strategy. -
[TPR-05-001-gemini][high]plans/lang-intelligence/section-05-parser-adapters.md:125— Change subsection close-out headers to checklist items. Evidence: Used ### headers instead of - [ ] checklist items per plan-schema.md. Resolved: Fixed on 2026-04-12. Converted all 5 close-out blocks to checklist item format. -
[TPR-05-002-gemini][medium]plans/lang-intelligence/section-05-parser-adapters.md:284— Move TPR checkpoint above subsection close-out. Evidence: TPR checkpoint was placed after 05.3 close-out instead of before it. Resolved: Fixed on 2026-04-12. Moved TPR checkpoint to before close-out block. -
[TPR-05-003-gemini][high]plans/lang-intelligence/section-05-parser-adapters.md:393— Add task to update Section 06 for query file rename. Evidence: Section 05 renames tags.scm to decls.scm but no plan-sync item to update Section 06. Resolved: Fixed on 2026-04-12. Added Section 06 update item to plan-sync block. (Overlaps with TPR-05-001-codex.) -
[TPR-05-001-codex][high](iter 2)section-06-symbol-extraction.md:54— Update Section 06 to consume adapter. Evidence: Section 06.2 still reads repos.yaml/tags.scm directly. Resolved: Fixed on 2026-04-12. Updated Section 06.2 contract to consume ParseResult/parse_repo(). -
[TPR-05-002-codex][medium](iter 2)section-05-parser-adapters.md:324— Stub query validation contradiction. Evidence: Plan says stubs are valid (zero captures) but also requires all queries to produce captures. Resolved: Fixed on 2026-04-12. Qualified validation: non-stubs must produce captures, stubs must compile cleanly. -
[TPR-05-003-codex][medium](iter 2)section-05-parser-adapters.md:62— Success criteria overstates extraction. Evidence: Section 05 claims “extracts structural symbols” but extraction is Section 06’s deliverable. Resolved: Fixed on 2026-04-12. Changed to “Unblocks mission criteria” (parsing half only). -
[TPR-05-001-codex][high](iter 3)scripts/setup-parsers.sh:78— Koka scanner patch detection uses grep on directory instead of file-existence check. Evidence:grep -q 'src/scanner.c' src/searches file CONTENTS, not checks file existence. Patch never applied on fresh bootstrap. Resolved: Fixed on 2026-04-13. Changed to[ -f src/scanner.c ]. -
[TPR-05-002-codex][high](iter 3)scripts/validate-parsers.py:159— Matrix test assertions too weak: no behavioral checks for malformed/empty/valid conditions. Evidence: —matrix only checks query compilation, not ERROR node presence on malformed, clean on empty, or >0 captures on valid. Resolved: Fixed on 2026-04-13. Added behavioral assertions per condition. -
[TPR-05-003-codex][high](iter 3)neo4j/parser_adapter.py:196— Missing declared query family only logs warning instead of hard error. Evidence: Missing .scm file returns {} silently, violating the hard-error contract. Resolved: Fixed on 2026-04-13. Promoted to RuntimeError. -
[TPR-05-004-codex][medium](iter 3)scripts/validate-parsers.py:155— Malformed fixtures use .txt extension but lookup uses native extensions. Evidence: tests/malformed/rust.txt never matched byf"{lang_id}{ext}"lookup for.rs. Resolved: Fixed on 2026-04-13. Renamed fixtures to native extensions (.rs, .go, .zig, etc.). -
[TPR-05-005-codex][medium](iter 3)neo4j/parser_adapter.py:119— load_manifests() missing path validation for source_root/issue_root. Evidence: Bad paths silently accepted. Path-existence validation only ran as inline ad-hoc check. Resolved: Fixed on 2026-04-13. Added path validation to load_manifests(). -
[TPR-05-001-gemini][high](iter 3)neo4j/parser_adapter.py:250— parallel parameter accepted but ignored in parse_repo. Evidence:parallel: bool = Falseplumbed through but never used. False API contract. Resolved: Fixed on 2026-04-13. Removed unused parameter. Sequential parsing meets <60s target. -
[TPR-05-002-gemini][medium](iter 3)scripts/validate-parsers.py:115— Same as TPR-05-004-codex (malformed fixture extension mismatch). Resolved: Fixed on 2026-04-13. Same fix as TPR-05-004-codex. -
[TPR-05-003-gemini][medium](iter 3)neo4j/parser_adapter.py:44— ParseResult missing source_root field for downstream absolute path construction. Evidence: Downstream consumers need absolute paths but only get relative_path. Must break encapsulation to reconstruct. Resolved: Fixed on 2026-04-13. Added source_root field to ParseResult.
05.N Completion Checklist
- All 9 tree-sitter grammars load with pinned versions (
requirements.txtverified compatibility matrix) -
languages.yamldefines all 10 languages (9 tree-sitter + Orinative) withcoverage_status,grammar_version,query_families; Lean ispartial -
repos.yamldefines all 11 repos with canonicalizedrepo_id/source_root/issue_root(resolvesgo/golangduality) - Parser adapter API (
parser_adapter.py) exposesParseResultwith all contract fields; error handling: soft per-file, hard grammar/query - Query file families (
decls.scm,calls.scm,imports.scm,impls.scm) exist for all 9 languages (stubs where appropriate; lean symlinks to cpp) -
validate-parsers.py --matrixpasses (108/108),--goldenprobes pass (8/8),--fullcompletes in 28.1s (<60s) -
setup-parsers.shautomates full environment setup;--verboseand--skip-kokaflags - Content hashing deterministic (SHA-256, same file = same hash); grammar update policy documented in validate-parsers.py
- Plan annotation cleanup: no stale plan references in code (infrastructure plan, no compiler code annotations)
- All intermediate TPR checkpoint findings resolved (pre-implementation TPR covered in 05.R)
- Plan sync — update plan metadata to reflect this section’s completion:
- This section’s frontmatter
status->complete, subsection statuses updated -
00-overview.mdQuick Reference table status updated (not-started → in-progress) -
00-overview.mdmission success criteria: line 24 not checked — requires Section 06 (extraction) to complete “and extracts structural symbols” half -
index.mdsection status updated (not-started → in-progress) - Next section’s (
06)depends_on: ["05"]verified — correct, no stale assumptions - Update Section 06 plan — already updated during plan review (TPR iter-2): Section 06.2 contract now consumes
ParseResult/parse_repo()and uses query family handles.
- This section’s frontmatter
-
/tpr-reviewpassed (final, full-section) — iter-3: 8 findings, all fixed. iter-4: 0 findings, clean pass. Both reviewers confirmed. -
/impl-hygiene-review— N/A for Python infrastructure (skill targets Rust compiler code). Code quality verified by TPR reviewers. -
/improve-toolingsection-close sweep — Per-subsection retrospectives all documented (05.1 through 05.5). Cross-subsection pattern: node type discovery friction repeated across 05.4 query writing — addressed by the enriched matrix test (validates all queries compile and produce captures on real source). No additional cross-subsection gaps. Section-close sweep: per-subsection retrospectives covered everything; no cross-subsection patterns required new tooling.
Exit Criteria: validate-parsers.py --full --golden passes with all 9 languages within expected error rates, <60 seconds total parse time, all golden probes within tolerance, and parser_adapter.py API contract verified by Section 06’s extraction script importing and using it without modification.