Proposal: std.i18n — Internationalization Library
Status: Draft
Created: 2026-04-02
Author: Eric (with AI assistance)
Affects: Standard library, library/std/i18n/
Depends on: stdlib-text-api-proposal.md (approved) — shares Unicode foundation
Prior art: ICU4X (Unicode Consortium), Go x/text, C# System.Globalization, Java java.text, CLDR
Summary
This proposal defines std.i18n — a standard library package for locale-aware internationalization operations: collation (locale-sensitive sorting), number formatting, date/time formatting, currency formatting, plural rules, and message formatting. Backed by CLDR (Unicode Common Locale Data Repository) data.
Motivation
std.text (approved) provides Unicode algorithms that are locale-independent: grapheme segmentation, normalization, case folding, display width. These work identically regardless of the user’s language.
Locale-dependent operations are a separate concern:
- Collation: “ä” sorts with “a” in German but after “z” in Swedish
- Number formatting: 1,234.56 (English) vs 1.234,56 (German) vs 1 234,56 (French)
- Date formatting: 04/02/2026 (US) vs 02/04/2026 (UK) vs 2026年4月2日 (Japan)
- Currency: $1,234 vs 1.234 € vs ¥1,234
- Plural rules: “1 file” vs “2 files” (English) vs “1 файл” / “2 файла” / “5 файлов” (Russian — 3 forms)
These require CLDR data (~500KB minimum for common locales, ~5MB for full coverage) and are architecturally distinct from std.text’s pure Unicode algorithms.
Scope
In Scope
- Collation: Locale-sensitive string comparison and sorting (UTS #10)
- Number formatting: Decimal, percent, scientific, compact (CLDR patterns)
- Date/time formatting: Date, time, datetime with locale patterns (CLDR)
- Currency formatting: Currency symbol placement, grouping (CLDR)
- Plural rules: Cardinal and ordinal plural forms (CLDR)
- Message formatting: ICU MessageFormat-style parameterized messages
- List formatting: “A, B, and C” vs “A, B und C” (CLDR)
- Locale type: BCP 47 language tags, locale matching
Out of Scope
- Calendar systems beyond Gregorian (Islamic, Hebrew, etc.) — future version
- Transliteration (Cyrillic → Latin, etc.) — future version
- Bidirectional text — covered by
std.text.unicode.bidi - Text segmentation — covered by
std.text.unicode.segmentation
API Sketch
use std.i18n { Locale, collation_sort, format_number, format_date }
use std.time { DateTime }
let locale = Locale.from_tag("de-DE")
// Locale-aware sorting
let sorted = collation_sort(["Zürich", "Aachen", "Österreich"], locale:)
// → ["Aachen", "Österreich", "Zürich"] (Ö sorts with O in German)
// Number formatting
format_number(1234.56, locale:) // → "1.234,56"
// Date formatting
let now = DateTime.now_utc()
format_date(now, locale:, style: DateStyle.Long) // → "2. April 2026"
// Plural-aware messages
format_message("{count, plural, one {# file} other {# files}}", count: 5, locale:)
// → "5 files"
Data Strategy
CLDR Data Packaging
Options to evaluate:
- Ship full CLDR data (~5MB) as part of
std.i18npackage - Ship common locales (~500KB for top 20 locales) with on-demand download for others
- Data-at-build-time —
ori builddownloads locale data based onori.tomlconfiguration - ICU4X backend — delegate to ICU4X which has optimized, tree-shakeable locale data
Capability
Locale-sensitive operations may need to detect the system locale:
@system_locale () -> Locale uses Env
// Reads LC_ALL / LANG environment variables
Pure operations that take an explicit locale: parameter need no capability.
Detailed Design
To be expanded during full proposal development.
Open Questions
- Should collation be in
std.i18n.collationorstd.text.collation? (Recommendation:std.i18nsince it needs CLDR data) - ICU4X as FFI backend vs pure Ori + CLDR data files?
- How to handle locale data distribution? Ship with package or download on demand?
- Should
std.i18nre-export locale-independentstd.textfunctions for convenience? - What is the minimum viable locale set to ship in the package?