Recursive Descent Parsing
The Ori parser uses recursive descent, a top-down parsing technique where each grammar rule becomes a function.
What is Recursive Descent?
In recursive descent:
- Each non-terminal in the grammar has a corresponding parse function
- Functions call each other according to grammar rules
- Lookahead determines which production to use
Grammar rule: Parse function:
expr -> term '+' expr fn parse_expr() {
let left = parse_term();
expect('+');
let right = parse_expr();
}
Grammar Structure
Module Level
module -> item*
item -> function | type_def | test | import | config
fn parse_module(&mut self) -> Module {
let mut functions = Vec::new();
let mut types = Vec::new();
let mut tests = Vec::new();
let mut imports = Vec::new();
while !self.at_end() {
match self.current() {
TokenKind::At => {
// Could be function or test
if self.peek_test() {
tests.push(self.parse_test());
} else {
functions.push(self.parse_function());
}
}
TokenKind::Type => types.push(self.parse_type_def()),
TokenKind::Use => imports.push(self.parse_import()),
TokenKind::Dollar => configs.push(self.parse_config()),
_ => self.error_unexpected(),
}
}
Module { functions, types, tests, imports }
}
Function Level
function -> attributes? 'pub'? '@' IDENT generics? params return_type? capabilities? '=' expr
fn parse_function(&mut self) -> Function {
let attrs = self.parse_attributes();
let is_pub = self.eat(TokenKind::Pub);
self.expect(TokenKind::At);
let name = self.parse_ident();
let generics = self.parse_optional_generics();
let params = self.parse_params();
let ret_type = self.parse_optional_return_type();
let capabilities = self.parse_optional_capabilities();
self.expect(TokenKind::Eq);
let body = self.parse_expr();
Function {
attrs, is_pub, name, generics,
params, ret_type, capabilities, body
}
}
Expression Level
expr -> if_expr | for_expr | match_expr | binary_expr
binary_expr -> unary_expr (binary_op unary_expr)*
unary_expr -> '-' unary_expr | '!' unary_expr | primary_expr
primary_expr -> literal | ident | call | '(' expr ')' | ...
fn parse_expr(&mut self) -> ExprId {
match self.current() {
TokenKind::If => self.parse_if(),
TokenKind::For => self.parse_for(),
TokenKind::Match => self.parse_match(),
_ => self.parse_binary_expr(),
}
}
fn parse_binary_expr(&mut self) -> ExprId {
self.parse_precedence(0)
}
fn parse_unary(&mut self) -> ExprId {
match self.current() {
TokenKind::Minus => {
self.advance();
let operand = self.parse_unary();
self.alloc(ExprKind::Unary { op: UnaryOp::Neg, operand })
}
TokenKind::Bang => {
self.advance();
let operand = self.parse_unary();
self.alloc(ExprKind::Unary { op: UnaryOp::Not, operand })
}
_ => self.parse_postfix(),
}
}
Operator Precedence Parsing
For binary expressions, we use Pratt parsing (precedence climbing):
fn parse_precedence(&mut self, min_prec: u8) -> ExprId {
let mut left = self.parse_unary();
loop {
let (op, prec, assoc) = match self.binary_op_info() {
Some(info) => info,
None => break,
};
if prec < min_prec {
break;
}
self.advance();
// Right associativity: use same prec, left: use prec + 1
let next_prec = if assoc == Assoc::Right { prec } else { prec + 1 };
let right = self.parse_precedence(next_prec);
left = self.alloc(ExprKind::Binary { left, op, right });
}
left
}
Lookahead
Sometimes we need to look ahead to decide:
fn parse_item(&mut self) {
match self.current() {
TokenKind::At => {
// @ could be function or test
// Look ahead to see if "tests" keyword follows
if self.lookahead_is_test() {
self.parse_test()
} else {
self.parse_function()
}
}
// ...
}
}
fn lookahead_is_test(&self) -> bool {
// @test_name tests @target ...
// Look for "tests" keyword after @ident
let mut pos = self.pos + 2; // Skip @ and ident
matches!(self.tokens.get(pos), Some(TokenKind::Tests))
}
Benefits of Recursive Descent
- Easy to understand - Grammar maps directly to code
- Good error messages - Know context when errors occur
- Flexible - Can handle context-sensitive syntax
- No external tools - Pure Rust implementation
Limitations
- Left recursion - Cannot handle
A -> A bdirectly - Backtracking - May need lookahead for ambiguity
- Repetition - Grammar rules become nested loops
Handling Left Recursion
Left recursion is transformed:
// Original (left recursive - won't work):
expr -> expr '+' term | term
// Transformed (iterative):
expr -> term ('+' term)*
// Transformed version
fn parse_add_expr(&mut self) -> ExprId {
let mut left = self.parse_term();
while self.check(&TokenKind::Plus) {
self.advance();
let right = self.parse_term();
left = self.alloc(ExprKind::Binary {
left, op: BinaryOp::Add, right
});
}
left
}
Ori-Specific Parsing
Named Arguments
// name: expr
fn parse_named_arg(&mut self) -> NamedArg {
self.expect(TokenKind::Dot);
let name = self.parse_ident();
self.expect(TokenKind::Colon);
let value = self.parse_expr();
NamedArg { name, value }
}
Pattern Expressions
// map(over: items, transform: fn)
fn parse_pattern_call(&mut self, name: Name) -> ExprId {
self.expect(TokenKind::LParen);
let mut args = Vec::new();
while !self.check(&TokenKind::RParen) {
args.push(self.parse_named_arg());
if !self.check(&TokenKind::RParen) {
self.expect(TokenKind::Comma);
}
}
self.expect(TokenKind::RParen);
self.alloc(ExprKind::Pattern { name, args })
}
Function Definitions
// @name (params) -> Type = body
fn parse_function(&mut self) -> Function {
self.expect(TokenKind::At);
let name = self.parse_ident();
self.expect(TokenKind::LParen);
let params = self.parse_params();
self.expect(TokenKind::RParen);
let ret_type = if self.check(&TokenKind::Arrow) {
self.advance();
Some(self.parse_type())
} else {
None
};
self.expect(TokenKind::Eq);
let body = self.parse_expr();
Function { name, params, ret_type, body }
}