7 Lexical Elements
Grammar: See grammar.ebnf § LEXICAL GRAMMAR
A token is the smallest lexical unit of the language. The five token classes are: identifiers, keywords, literals, operators, and delimiters.
Whitespace — consisting of space (U+0020), horizontal tab (U+0009), and newline (U+000A, after normalization per 6.1.1) — separates tokens but is otherwise not significant. Ori is not indentation-sensitive.
At least one whitespace character or delimiter shall separate adjacent tokens that would otherwise form a single token. For instance, intx is the identifier intx, not the keyword int followed by the identifier x.
7.1 Comments
Comments begin with // and extend to the end of the line. Inline comments (comments following code on the same line) are not permitted.
// This is a valid comment
@add (a: int, b: int) -> int = a + b;
@sub (a: int, b: int) -> int = a - b; // error: inline comment
The content of a comment is arbitrary text: any unicode_char (see 6.1) except newline may appear in a comment.
7.1.1 Doc comments
Doc comments use special markers to classify documentation content:
| Marker | Purpose | Example |
|---|---|---|
| (none) | Description | // This is a description. |
* | Parameter or field | // * name: Description |
! | Warning or panic | // ! Panics if x is negative |
> | Example | // > func(x: 1) -> 2 |
The canonical form for member documentation is // * name: description with a space after * and a colon always required.
Any comment immediately preceding a declaration is treated as documentation for that declaration. Non-documentation comments shall be separated from declarations by a blank line:
// TODO: refactor this later
// Computes the sum of two integers.
@add (a: int, b: int) -> int = a + b;
7.2 Identifiers
An identifier names a program entity such as a variable, type, function, or module.
identifier = ( letter | "_" ) { letter | digit | "_" } .
where letter and digit are as defined in 6.1.2.
Identifiers are restricted to ASCII letters, ASCII digits, and underscore. They are case-sensitive: count, Count, and COUNT are three distinct identifiers.
An identifier shall not be a reserved keyword (see 7.3.1) or future-reserved keyword (see 7.3.2) unless it appears in member position (after .).
The maximum identifier length is implementation-defined; an implementation shall accept identifiers of at least 1 000 characters.
Identifiers are compared byte-for-byte. No Unicode normalization (NFC, NFD, or other) is applied.
NOTE The $ and @ prefixes are sigils (see 7.6), not part of the identifier. In the binding let $timeout = 30s, the identifier is timeout; the $ marks it as immutable.
EXAMPLE Valid identifiers: x, _count, Point, my_function, x2, HTTP_STATUS.
EXAMPLE The following are not identifiers: 2fast (starts with digit), my-var (contains hyphen), let (reserved keyword).
7.2.1 Predeclared identifiers
Certain identifiers are predeclared in the universe scope (see 11.1). These include type names (int, float, str, byte, bool, void, Never), built-in functions (print, panic, len, etc.), and predefined types (Option, Result, Error, Ordering). See Annex C for the complete list.
Predeclared identifiers may be shadowed by user declarations in inner scopes.
7.2.2 Exported identifiers
An identifier prefixed with pub in its declaration is exported and visible to importing modules. See Clause 18.
7.2.3 Blank identifier
The underscore _ used alone as a pattern is the blank identifier. It matches any value and discards it.
let _ = compute(); // discard result
match x { _ -> 0 } // match anything
7.3 Keywords
Grammar: See grammar.ebnf § Keywords for the complete keyword listing.
7.3.1 Reserved
Reserved keywords shall not be used as identifiers except in member position (after .). The reserved keywords are:
as break continue def div do else extend
extension extern false for if impl in let
loop match Never pub self Self suspend tests
then trait true type unsafe use uses void
where while with yield
Exception: In member position (after .), any keyword may be used as a field or method name. The . prefix provides unambiguous context, so x.then(y) is a method call, not part of an if/then expression. See grammar.ebnf § member_name.
7.3.2 Reserved (future)
The following keywords are reserved for future language features. Their use as identifiers is a compile-time error with an informative message:
asm inline static union view
7.3.3 Context-sensitive
Context-sensitive keywords are recognized as keywords only in specific syntactic positions. Outside those positions, they are valid identifiers. See grammar.ebnf § Keywords for the complete listing and position rules.
7.3.4 Built-in names
Built-in names are reserved in call position (when followed by (). Outside call position they may be used as variable names. See Annex C for semantics.
7.4 Operators
Rules: See Annex B for operator semantics.
7.4.1 Precedence
Operators are listed from highest to lowest precedence:
| Prec | Operators | Associativity |
|---|---|---|
| 1 | . [] () ? as as? | Left |
| 2 | ** | Right |
| 3 | ! - ~ (unary) | Right |
| 4 | * / % div @ | Left |
| 5 | + - | Left |
| 6 | << >> | Left |
| 7 | .. ..= | Non-associative |
| 8 | < > <= >= | Non-associative |
| 9 | == != | Non-associative |
| 10 | & | Left |
| 11 | ^ | Left |
| 12 | | | Left |
| 13 | && | Left |
| 14 | || | Left |
| 15 | ?? | Right |
| 16 | |> | Left |
NOTE Range (.., ..=), comparison (<, >, <=, >=), and equality (==, !=) operators are non-associative: a == b == c is a compile-time error rather than chaining.
7.4.2 Short-circuit evaluation
The logical operators && and || use short-circuit evaluation: the right operand is evaluated only if the left operand does not determine the result.
a && b: ifaisfalse, the result isfalseandbis not evaluated.a || b: ifaistrue, the result istrueandbis not evaluated.
The null coalescing operator ?? also short-circuits: a ?? b evaluates b only if a is None.
7.4.3 Compound assignment operators
Compound assignment operators combine a binary operation with assignment:
+= -= *= /= %= **= @= &= |= ^= <<= >>= &&= ||=
A compound assignment x op= y desugars to x = x op y at the parser level. The target x shall be a mutable binding (not $-prefixed). Compound assignments are statements, not expressions.
The operators &&= and ||= preserve short-circuit semantics: x &&= y evaluates y only if x is true.
7.5 Delimiters
The following characters serve as delimiters:
| Delimiter | Name | Usage |
|---|---|---|
( ) | Parentheses | Grouping, function parameters, tuples |
[ ] | Brackets | List literals, indexing, fixed-capacity |
{ } | Braces | Blocks, struct/map literals, interpolation |
, | Comma | Element separator |
: | Colon | Type annotation, named argument, map entry |
. | Dot | Member access, module path |
; | Semicolon | Statement terminator |
7.6 Sigils
Sigils are single-character prefixes with specific meanings:
| Sigil | Purpose | Example |
|---|---|---|
@ | Function declaration | @main () |
$ | Immutable binding | let $timeout = 30s; |
The @ sigil marks a function declaration. It is required before function names at declaration sites and optional at call sites.
The $ sigil marks a binding as immutable. It appears at the definition site, import site, and all usage sites. See Clause 13 for details.
7.7 Literals
A literal represents a fixed value of a specific type in source code.
7.7.1 Integer literals
An integer literal is a sequence of digits representing a value of type int (64-bit signed integer).
int_lit = decimal_lit | hex_lit | binary_lit .
decimal_lit = "0" | non_zero_digit { [ "_" ] digit } .
non_zero_digit = "1" … "9" .
hex_lit = "0" ( "x" | "X" ) hex_digit { [ "_" ] hex_digit } .
binary_lit = "0" ( "b" | "B" ) bin_digit { [ "_" ] bin_digit } .
Leading zeros in decimal literals are not permitted. 007 is a compile-time error; write 7 instead. The single digit 0 is valid.
There is no octal literal prefix.
Digit separators (_) may appear between digits for readability. The following restrictions apply:
- A separator shall not appear at the beginning of the digit sequence (after the prefix, if any).
- A separator shall not appear at the end of the digit sequence.
- Two adjacent separators shall not appear.
EXAMPLE Valid integer literals: 42, 1_000_000, 0xFF, 0xFF_FF, 0b1010_0101, 0.
EXAMPLE The following are not valid: 007 (leading zero), 42_ (trailing separator), 4__2 (adjacent separators), 0x (no digits after prefix), 0b (no digits after prefix).
An integer literal shall represent a value in the range 0 to 2^63 − 1 (9 223 372 036 854 775 807). A literal value outside this range is a compile-time error.
NOTE The minimum int value (−2^63) cannot be written as a literal because the positive value 2^63 exceeds the literal range. It is available as the associated constant int.min.
Both uppercase and lowercase hex digits are accepted: 0xAB, 0xab, and 0xAb are equivalent.
7.7.2 Float literals
A float literal is a decimal representation of a value of type float (IEEE 754 binary64).
float_lit = decimal_digits "." decimal_digits [ exponent ]
| decimal_digits exponent .
exponent = ( "e" | "E" ) [ "+" | "-" ] decimal_digits .
decimal_digits = digit { [ "_" ] digit } .
A float literal shall have at least one digit before and after the decimal point. Leading-dot (.5) and trailing-dot (5.) forms are not permitted.
Digit separators follow the same rules as integer literals: 1_000.000_001 is valid.
A float literal that overflows the representable range of IEEE 754 binary64 is a compile-time error.
NOTE The special values positive infinity, negative infinity, and NaN are not literals. They are accessed via associated constants: float.inf, float.neg_inf, float.nan. Additional constants are float.max, float.min (smallest positive normal), and float.epsilon (machine epsilon).
EXAMPLE Valid float literals: 3.14, 2.5e-8, 1_000.0, 1.0e10, 6.022E23.
EXAMPLE The following are not valid: .5 (no leading digit), 5. (no trailing digit), 1e (no exponent digits), 0x1.0p10 (no hex floats).
7.7.3 String literals
A string literal represents a value of type str (UTF-8 encoded text). String literals are delimited by double quotes (").
string_lit = '"' { string_char | escape_seq } '"' .
string_char = unicode_char - '"' - '\' - newline .
Regular strings do not support interpolation. Braces { and } are literal characters in regular strings.
A string literal shall not contain an unescaped newline. Multi-line text shall use \n escape sequences or template strings (7.7.4).
7.7.3.1 Escape sequences
The following escape sequences are recognized in string literals, template strings, and character literals:
| Escape | Unicode | Name |
|---|---|---|
\\ | U+005C | Backslash |
\" | U+0022 | Double quote |
\n | U+000A | Newline (line feed) |
\t | U+0009 | Horizontal tab |
\r | U+000D | Carriage return |
\0 | U+0000 | Null |
\u{H} | U+HHHH | Unicode code point |
\xHH | U+00HH | Hex byte value |
The \xHH escape specifies a value using exactly two hexadecimal digits (case-insensitive). In string and char literal contexts, the value shall be in the ASCII range (\x00–\x7F); values \x80–\xFF are a compile-time error. In byte literal contexts (7.7.6), the full range \x00–\xFF is valid.
This table is exhaustive. Any other \-prefixed sequence is a compile-time error.
NOTE Legacy escape sequences (\a, \b, \f, \v) are not supported. Use \u{7} for bell (U+0007), \u{8} for backspace (U+0008), etc.
7.7.3.2 Unicode escape sequences
A unicode escape sequence \u{H} represents a single Unicode code point. The braces contain 1 to 6 hexadecimal digits (case-insensitive).
The value shall be a valid Unicode scalar value: U+0000 through U+D7FF or U+E000 through U+10FFFF. Surrogate code points U+D800 through U+DFFF are not valid and produce a compile-time error.
EXAMPLE \u{1F600} (grinning face), \u{0041} (‘A’), \u{7} (bell).
EXAMPLE \u{D800} is an error (surrogate code point).
7.7.4 Template strings
A template string is delimited by backticks (`) and supports expression interpolation.
template_lit = '`' { template_char | escape_seq | template_escape | interpolation } '`' .
template_char = unicode_char - '`' - '\' - '{' - '}' - newline .
template_escape = "{{" | "}}" .
interpolation = '{' expression [ ':' format_spec ] '}' .
Interpolated expressions are enclosed in { }. Each interpolated expression shall implement the Printable trait (or Formattable if a format specifier is present).
Literal braces within template strings are written as {{ (left brace) and }} (right brace). Literal backticks are written as \`.
The standard escape sequences (see 7.7.3.1) are valid in template strings.
Template strings may span multiple lines. Whitespace, including newlines, is preserved exactly as written.
Nested template strings are valid. An interpolation {...} may contain any expression, including another template string.
EXAMPLE
let name = "World";
`Hello, {name}!` // "Hello, World!"
`{value:.2}` // 2 decimal places
`{count:05}` // zero-pad to 5 digits
`outer {`inner {x}`}` // nested template strings
7.7.5 Character literals
A character literal represents a single Unicode scalar value of type char.
char_lit = "'" ( char_char | char_escape ) "'" .
char_char = unicode_char - "'" - '\' - newline .
char_escape = escape_seq | "\'" .
A character literal shall contain exactly one character or escape sequence. The following are compile-time errors:
- Empty character literal:
'' - Multi-character literal:
'ab' - Surrogate code point (via unicode escape):
'\u{D800}' - Hex escape out of ASCII range:
'\x80'through'\xFF'
The escape sequence \' (single quote) is valid in character literals but not in string or template literals. The \xHH escape is valid in character literals but restricted to the ASCII range (\x00–\x7F). Values \x80–\xFF are a compile-time error; use \u{HH} for code points above U+007F.
EXAMPLE Valid character literals: 'a', '\n', '\u{1F600}', '\0', '\'', '\x41'.
7.7.6 Byte literals
A byte literal represents an unsigned 8-bit integer value of type byte.
Grammar: See Annex A §A.LEXICAL_GRAMMAR
byte_lit = "b'" ( byte_char | byte_escape ) "'" .
byte_char = ascii_char - "'" - "\" .
byte_escape = "\\" | "\'" | "\n" | "\t" | "\r" | "\0" | "\x" hex hex .
A byte literal is prefixed with b and shall contain exactly one ASCII character or escape sequence. The type is byte.
Unescaped characters in byte literals shall be printable ASCII (U+0020–U+007E), excluding the delimiter ' and the escape introducer \. Non-ASCII characters are a compile-time error.
The \xHH escape specifies a byte value from 0x00 to 0xFF using exactly two hexadecimal digits. The full range is valid in byte literals (unlike char literals, which restrict to \x00–\x7F).
The escapes \u{...} (Unicode) and \" (double quote) are not valid in byte literals. Unicode escapes are not meaningful for bytes (which are 8-bit values), and double quotes do not need escaping since they are not the delimiter.
EXAMPLE Valid byte literals: b'a', b'\n', b'\0', b'\x1B', b'\xFF'.
EXAMPLE Invalid byte literals: b'\u{41}' (error: unicode escape in byte literal), b'é' (error: non-ASCII character).
7.7.7 Boolean literals
The boolean literals are true and false. They have type bool.
7.7.8 Duration literals
A duration literal represents a time span of type Duration. The internal representation is a 64-bit signed integer count of nanoseconds.
duration_lit = duration_number duration_suffix .
duration_number = decimal_lit | decimal_lit "." digit { digit } .
duration_suffix = "ns" | "us" | "ms" | "s" | "m" | "h" .
Digit separators are permitted in the numeric part: 1_000ms is valid.
The suffixes denote the following units:
| Suffix | Unit | Nanoseconds |
|---|---|---|
ns | Nanoseconds | 1 |
us | Microseconds | 1 000 |
ms | Milliseconds | 1 000 000 |
s | Seconds | 1 000 000 000 |
m | Minutes | 60 000 000 000 |
h | Hours | 3 600 000 000 000 |
Decimal syntax is compile-time sugar using integer arithmetic. The result shall be a whole number of nanoseconds; otherwise it is a compile-time error.
EXAMPLE 0.5s = 500 000 000 ns (valid). 1.5ms = 1 500 000 ns (valid). 1.5ns is an error (0.5 nanoseconds is not representable).
The maximum duration is limited by the i64 range of nanoseconds, approximately ±292 years.
7.7.9 Size literals
A size literal represents a quantity of data of type Size. The internal representation is a 64-bit signed integer count of bytes (non-negative).
size_lit = size_number size_suffix .
size_number = decimal_lit | decimal_lit "." digit { digit } .
size_suffix = "b" | "kb" | "mb" | "gb" | "tb" .
Digit separators are permitted in the numeric part: 10_000kb is valid.
Size units use SI prefixes (1000-based, not 1024-based):
| Suffix | Unit | Bytes |
|---|---|---|
b | Bytes | 1 |
kb | Kilobytes | 1 000 |
mb | Megabytes | 1 000 000 |
gb | Gigabytes | 1 000 000 000 |
tb | Terabytes | 1 000 000 000 000 |
Decimal syntax follows the same rules as duration literals: the result shall be a whole number of bytes.
EXAMPLE 1.5kb = 1 500 bytes (valid). 0.5b is an error (0.5 bytes is not representable).
Negative size literals are compile-time errors. A Size value shall be non-negative.
7.8 Semicolons
Semicolons (;) serve as statement terminators inside block expressions. Outside block expressions, newlines terminate top-level declarations.
7.8.1 Block semicolons
Within a block expression { ... }, semicolons govern statement termination and the block’s result value:
- Every statement in a block shall be terminated by
;. - The last element of a block, if not terminated by
;, is the result expression — its value becomes the value of the block. - If every element in a block is terminated by
;, the block has typevoid.
The following constructs are statements (require ; when not the last element):
letbindingsuseimports- assignments and compound assignments
- expression statements (an expression evaluated for its side effect)
EXAMPLE
{
let $x = 1; // statement: needs ;
let $y = 2; // statement: needs ;
x + y // result expression: no ;
}
// Block has type int, value 3
EXAMPLE
{
print(msg: "hello");
print(msg: "world");
}
// All elements have ; → block has type void
7.8.2 Declaration termination
A top-level declaration whose body is an expression (not a block) shall be terminated by ;. A declaration whose body ends with } does not require ;.
EXAMPLE
@add (a: int, b: int) -> int = a + b; // expression body: needs ;
@process (x: int) -> int = { let $y = x * 2; y } // block body: no ;
type Point = { x: int, y: int } // ends with }: no ;
let $MAX = 100; // module-level constant: needs ;
7.9 Trailing commas
Trailing commas are permitted after the last element in all comma-separated lists: parameter lists, argument lists, list literals, map literals, struct fields, variant payloads, match arms, and generic parameter lists.
The formatter inserts trailing commas in multi-line constructs and removes them in single-line constructs.
EXAMPLE
type Color = {
r: int,
g: int,
b: int, // trailing comma permitted
}
7.10 Lexer-parser contract
The lexer produces a flat stream of minimal tokens. The parser combines adjacent tokens based on context.
7.10.1 Greater-than sequences
The lexer produces individual > tokens. It never produces >>, >=, or >>= as single tokens.
In expression context, adjacent tokens form compound operators:
>followed immediately by>(no whitespace) → right shift>>>followed immediately by=(no whitespace) → greater-or-equal>=
In type context, > closes a generic parameter list.
// Each > is a separate token — nested generics parse correctly
let x: Result<Result<int, str>, str> = Ok(Ok(1));
// In expressions, >> is right shift
let y = 8 >> 2;
7.10.2 Token classification
The lexer classifies tokens by their first character:
- Digit → integer or float literal (or duration/size suffix)
- Letter or underscore → identifier or keyword
"→ string literal`→ template string literal'→ character literal- Operator character → operator
Reserved keywords take precedence over identifiers: the sequence let is always the keyword let, never the identifier let.
7.11 Disambiguation
7.11.1 Struct literals
An uppercase identifier followed by { is interpreted as a struct literal in expression context. In if condition context, a struct literal is not permitted because the { would be ambiguous with a block.
// Struct literal in expression — valid
let p = Point { x: 1, y: 2 };
// In if condition — error: struct literal not allowed
if Point { x: 1, y: 2 }.valid then ... // error
// Use parentheses to disambiguate
if (Point { x: 1, y: 2 }).valid then ... // OK
// In then-branch — valid: no ambiguity
if condition then Point { x: 1, y: 2 } else default
NOTE The empty brace pair { } (with whitespace) is parsed as an empty map literal, not an empty block.
7.11.2 Soft keywords
The following identifiers are keywords only when followed by ( in expression position:
cache catch for match parallel
recurse spawn timeout try with
The identifier by is a keyword only when it follows a range expression (.. or ..=):
0..10 by 2 // by is a keyword (range step)
let by = 2;
0..10 by by // first by is keyword, second is variable
Outside these contexts, soft keywords may be used as variable names.
7.11.3 Parenthesized expressions
A parenthesized expression (...) is interpreted as:
- Lambda parameters if followed by
->and contents match parameter syntax - Tuple if it contains a comma:
(a, b) - Unit if empty:
() - Grouped expression otherwise
NOTE A single-element tuple is not supported. (a,) is parsed as a parenthesized expression (the trailing comma is ignored), not a single-element tuple.
(x) -> x + 1 // lambda with one parameter
(x, y) -> x + y // lambda with two parameters
(a, b) // tuple
() // unit
(a + b) * c // grouped expression