Section 04: IR Parser Hardening
Status: Not Started
Goal: Fix known parsing failures so all 12 journeys produce complete, accurate parse results with zero parse_errors. After this section, J8 (generics) parses all monomorphized functions.
Context: The ir_parser.py regex _FUNC_NAME_RE = re.compile(r'@([\w.$]+)\s*\(') cannot parse quoted LLVM function names like @"_ori_first$24m$24int_int" produced by monomorphized generics. J8 works around this (enough unquoted functions exist to compute scores), but it’s a known gap documented in plans/code-journeys/overview.md.
WARNING (file size):
ir_parser.pyis already 503 lines — at the limit. Adding quoted name support (+20 lines) and invoke handling (+30 lines) will push it to ~550+ lines. As a prerequisite step for this section, splitir_parser.pyinto:
ir_parser.py(~300 lines) — Module/Function/Block data classes +parse_module()entry pointir_parser_internal.py(~250 lines) — regex patterns, line-level parsing helpers, instruction classificationThis split also makes it easier to test individual parsing functions in isolation.
Depends on: Nothing (independent fix).
04.0 Prerequisite: Split ir_parser.py
File(s): .claude/skills/code-journey/ir_parser.py (split into two files)
- Split
ir_parser.pyintoir_parser.py(data classes + public API) andir_parser_internal.py(regex patterns + line-level helpers) - Update all imports in
arc_metrics.py,attribute_metrics.py,control_flow_metrics.py,instruction_metrics.py,extract-metrics.py— these import fromir_parser, which should remain the public API module - Verify
python3 -m pytest tests/test_ir_parser.pypasses after split
04.1 Quoted Function Names
File(s): .claude/skills/code-journey/ir_parser.py
LLVM uses quoted names when the identifier contains characters not valid in bare identifiers. Ori’s mangling uses $ which is valid, but some backends quote names with special characters.
-
Update
_FUNC_NAME_REto handle both bare and quoted names:# Before: _FUNC_NAME_RE = re.compile(r'@([\w.$]+)\s*\(') # After: handles @name( and @"name"( # Note: use [\w.$]+ for bare names (not \S+? which would match a single char) _FUNC_NAME_RE = re.compile(r'@(?:"([^"]+)"|([\w.$]+))\s*\(') -
Update
_parse_function_header()to extract from the correct capture group:name_match = _FUNC_NAME_RE.search(stripped) if not name_match: return None # Group 1 = quoted name, Group 2 = bare name func_name = name_match.group(1) or name_match.group(2) raw_name = f'@"{func_name}"' if name_match.group(1) else f"@{func_name}" -
Fix
namefield for quoted functions: TheFunction.namefield (without@) must strip quotes too. With the proposed regex,group(1)already gives the unquoted name. Butraw_name.lstrip('@')in_parse_function_declproduces'"name"'(with quotes) for quoted names. Usefunc_name(from the regex group) directly instead of stripping fromraw_name. This affects both_parse_function_defand_parse_function_declsince they share_parse_function_header. -
Fix property predicates for quoted names: Properties that use
raw_name.startswith(...)break with quoted names (e.g.,@"_ori_..."does not start with@_ori_). Fix each to useself.name(without@or quotes):is_user_function:self.raw_name.startswith("@_ori_")— fails for@"_ori_...". Fix: useself.name.startswith("_ori_")is_runtime_decl:self.raw_name.startswith("@ori_")— fails for@"ori_...". Fix: useself.name.startswith("ori_")is_entry_called:self.raw_name == "@_ori_main"— OK (main is never quoted)is_llvm_intrinsic:self.raw_name.startswith("@llvm.")— OK (intrinsics never quoted)
-
Add test with quoted name IR:
def test_quoted_function_name(): ir = 'define fastcc i64 @"_ori_first$24m$24int_int"(i64 %0) {\nentry:\n ret i64 %0\n}\n' module = parse_module(ir) assert '@"_ori_first$24m$24int_int"' in module.functions func = module.functions['@"_ori_first$24m$24int_int"'] assert func.name == '_ori_first$24m$24int_int' # No quotes in .name assert func.is_user_function # Starts with _ori_
04.2 Multi-line Instruction Handling
File(s): .claude/skills/code-journey/ir_parser.py
The multi-line switch fix (already done) should be generalized for other multi-line constructs.
-
Audit for other multi-line patterns in LLVM IR:
phiwith many incoming values (wraps across lines)landingpadwith multiplecatch/filterclausesinvokewith longto/unwindlabels that may wrap
-
If any are found in journey IR, add continuation-line joining (same pattern as switch fix)
-
Parse
invokeas a first-class instruction:invokehas a different syntax fromcalland is currently not handled by the RC counting regexes (_RC_INC_RE/_RC_DEC_REonly matchcall, notinvoke;_RC_INVOKE_REexists but is never used for balance counting):%result = invoke fastcc i64 @func(i64 %0) to label %normal unwind label %cleanupThe parser must:
- Recognize
invokeas an opcode (currently it does extract it as opcode, but the downstream consumers ignore it) - Extract the callee name from
invokeinstructions (same ascall) - Extract the
to label %X unwind label %Ytargets for CFG construction
_INVOKE_RE = re.compile( r'invoke\b.*@(?:"([^"]+)"|(\S+?))\s*\(' # callee name ) _INVOKE_TARGETS_RE = re.compile( r'to\s+label\s+%(\S+)\s+unwind\s+label\s+%(\S+)' ) - Recognize
-
Update
arc_metrics.py: The_RC_INVOKE_REalready exists but is NEVER used in_count_rc_ops(). Either:- Extend
_RC_INC_RE/_RC_DEC_REto also matchinvoke.*@ori_rc_incetc., OR - Add a separate count for invoke-based RC operations
- Verify: Does Ori’s codegen ever emit
invoke @ori_rc_inc? If not (which is likely — RC functions don’t unwind), this is defensive hardening only.
- Extend
-
Update
extract_branch_targets()inir_utils.py: Currently only handlesbrandswitch. Must also handleinvoketargets for correct CFG construction in control_flow_metrics.py and the new rc_state.py (Section 02).
04.3 Completion Checklist
-
ir_parser.pysplit intoir_parser.py+ir_parser_internal.py(each <=400 lines) -
_FUNC_NAME_REhandles both@nameand@"name"patterns -
invokeinstructions parsed with callee extraction and target extraction -
extract_branch_targets()handles invoketo/unwindtargets -
arc_metrics.pycounts RC ops in bothcallandinvokeinstructions - J8 (generics): all monomorphized functions parsed (0 parse errors)
- All 12 journeys: 0
parse_errorsin output - Tests cover: bare names, quoted names, names with
$, empty module, invoke instructions -
python3 -m pytest tests/test_ir_parser.pypasses
Exit Criteria: parse_module() on J8’s IR returns a Module with zero parse_errors and includes all monomorphized function definitions (verified by count matching grep -c '^define' ir.txt).