Desugaring

What Is Desugaring?

Every programming language faces a tension: the surface syntax users write should be expressive and ergonomic, but the internal representation compilers operate on should be minimal and uniform. Desugaring is the compiler phase that resolves this tension by translating convenient surface syntax (“syntactic sugar”) into a smaller set of core constructs that the rest of the compiler understands.

The term comes from Peter Landin’s 1964 observation that some language constructs are “syntactic sugar” — they make the language sweeter to use but add no fundamental expressiveness. x += 1 is sugar for x = x + 1. Template strings like `Hello, {name}` are sugar for string concatenation. Named arguments like f(x: 1, y: 2) are sugar for positional calls f(1, 2). Each of these can be mechanically translated into simpler forms without losing meaning.

Why Compilers Desugar

Without desugaring, every downstream phase — type checking, optimization, code generation — must handle both the sugared and unsugared forms of each construct. This creates a multiplicative complexity problem:

N sugar variants × M compiler phases = N×M code paths to maintain

Desugaring collapses this to:

N translations (once) + M phases × core variants only

A language with 7 sugar variants and 3 downstream phases goes from 21 code paths to 7 + 3×(core only). More importantly, each downstream phase can be written and tested against a single canonical form, eliminating entire classes of “forgot to handle this variant” bugs.

The Desugaring Contract

A correct desugaring transformation must preserve three properties:

  1. Semantics — the desugared form produces the same result as the original
  2. Types — if the original type-checks, the desugared form type-checks identically
  3. Errors — error messages still point to the original source location (via span preservation)

The transformation is purely mechanical — it reorders, restructures, or replaces syntax nodes, but never performs semantic analysis, type inference, or optimization. Those are the jobs of other phases.

When To Desugar: Early vs Late

Compilers choose different points in the pipeline for desugaring:

StrategyExamplesTradeoff
Before type checkingHaskell (do-notation), Scala (for-comprehensions)Simpler type checker, but sugar-aware error messages are harder
After type checkingOri, Roc, ElmType checker sees sugar (better errors), backends see core (simpler codegen)
ProgressiveGHC (Core → STG → Cmm), Rust (HIR → MIR)Multiple IR levels, each desugaring further

Ori desugars after type checking — the type checker operates on the full sugared AST with named arguments, template literals, and spreads, producing richer error messages that reference the syntax the user actually wrote. Only after types are resolved does canonicalization strip the sugar away.

How Ori Desugars

Desugaring is integrated into lower_expr(), not a separate pass. When the lowerer encounters a sugar variant in the AST, it calls the corresponding desugar_* method inline, producing canonical expressions directly. This eliminates 7 syntactic sugar variants from the AST in a single traversal alongside all other lowering work. The transformation is mechanical and type-preserving — no semantic analysis occurs.

The Core Idea: Two Distinct Types

The key architectural decision is that CanExpr (canonical) is a separate type from ExprKind (AST). Sugar variants exist in ExprKind but have no corresponding variant in CanExpr:

ExprKind (AST)CanExpr (Canonical)What Happens
CallNamedReordered to positional Call
MethodCallNamedReordered to positional MethodCall
TemplateLiteralExpanded to concatenation chain
TemplateFullExpanded to concatenation chain
ListWithSpreadLowered to method calls
MapWithSpreadLowered to method calls
StructWithSpreadLowered to method calls

Because these variants don’t exist in CanExpr, a backend that tries to match on CallNamed gets a compile error — not a runtime panic, not a silent bug. The Rust type system enforces that all sugar has been eliminated before any backend sees the IR.

This is stronger than the common approach of using a shared AST type with a “should never appear after desugaring” comment. Comments don’t prevent bugs; distinct types do.

Sugar Variants Eliminated

1. Named Calls → Positional Calls

Ori requires named arguments at call sites, which improves readability but means argument order in source may differ from parameter order in the function signature:

// Source: arguments in any order, with labels
fetch(timeout: 5s, url: "https://example.com")

// Canonical: arguments reordered to match signature
fetch("https://example.com", 5s)

The lowerer uses the function’s type signature (resolved during type checking) to map each named argument to its positional index. This is a pure reordering — no values change, only their position in the argument list.

ExprKind::CallNamed becomes CanExpr::Call with arguments in signature order.

2. Named Method Calls → Positional Method Calls

The same named-to-positional translation applies to method calls:

// Source
list.filter(predicate: x -> x > 0)

// Canonical
list.filter(x -> x > 0)

ExprKind::MethodCallNamed becomes CanExpr::MethodCall with positional arguments.

3. Template Literals → String Concatenation

Template literals are convenient syntax for building strings with embedded expressions. The lowerer expands them into explicit concatenation chains with to_str() conversions:

// Source
`Hello, {name}! You are {age} years old.`

// Canonical
"Hello, " ++ to_str(name) ++ "! You are " ++ to_str(age) ++ " years old."

Each interpolated expression gets wrapped in a to_str() call (the Printable trait’s method), then the segments are joined with the ++ concat operator. A template with N interpolations becomes a left-folded chain of N×2 concat operations.

ExprKind::TemplateLiteral and TemplateFull become chains of CanExpr::BinaryOp(Concat).

4. List Spread → Method Calls

List spread syntax creates a new list by combining existing lists with new elements:

// Source
[...existing, new_item]

// Canonical
existing.append(new_item)

ExprKind::ListWithSpread becomes method calls on list operations.

5. Map Spread → Method Calls

Map spread merges maps and adds new key-value pairs:

// Source
{...base_map, key: value}

// Canonical
base_map.with(key, value)

ExprKind::MapWithSpread becomes method calls on map operations.

6. Struct Spread → Method Calls

Struct spread creates a copy of a struct with some fields overridden:

// Source
Point { ...base, x: 10 }

// Canonical
base.with_x(10)

This is a functional update — the original struct is not mutated. The desugared form uses synthesized with_* method calls that produce new struct values.

ExprKind::StructWithSpread becomes method calls that produce updated struct values.

Earlier Desugarings (Before Canonicalization)

Not all desugaring happens in ori_canon. Some transformations occur earlier in the pipeline:

SugarDesugared ToWhere
x |> f(a:) (pipe)let tmp = x; f(a: tmp)Type checker
x += 1 (compound assign)x = x + 1Parser
x < y (comparison ops)x.compare(y).is_less()Type checker
for x in items yield x * 2Iterator/collect expansionType checker

These desugar earlier because they affect type inference or need semantic information that only the type checker has. Pipe desugaring, for example, must determine which parameter to fill — this requires knowing the function’s signature, which is type-level information.

The canonicalization phase handles only the sugar that survived through type checking intact.

Design Rationale

Desugaring at the canonical IR boundary means:

  1. Backends are simplerori_eval and ori_arc handle fewer expression variants, reducing code and bug surface
  2. Type-level enforcementCanExpr physically cannot represent sugar, so backends can’t accidentally miss a case
  3. Single point of truth — desugaring logic lives in one place (ori_canon/src/desugar/), not duplicated across backends
  4. Testing is focused — each backend tests core semantics, not sugar-to-core translation
  5. Better error messages — because the type checker sees the sugared AST, errors reference named arguments, template syntax, and spreads that the user actually wrote, rather than the desugared equivalents they never see

Prior Art

LanguageApproach
Haskell/GHCProgressive desugaring: source → Core (eliminates do-notation, list comprehensions, where-clauses) → STG → Cmm. Each IR level is simpler than the last.
RustHIR desugars for loops, ? operator, async/await. MIR desugars further (drops, pattern matching). Two-stage approach.
Rocast::Exprcan::Expr eliminates string interpolation, record updates, when sugar. Similar to Ori’s single-phase approach.
ElmSourceCanonical eliminates ports, effects, operators. Decision trees added in Optimized phase.
Scalafor-comprehensions desugar to flatMap/map/filter chains before type checking — the classic “desugar early” approach.