Proposal: Intrinsics Capability

Status: Approved Author: Eric (with AI assistance) Created: 2026-01-30 Approved: 2026-01-30 Affects: Compiler, capabilities, low-level operations

Summary

This proposal formalizes the Intrinsics capability for low-level, platform-specific operations including SIMD, bit manipulation, and hardware feature detection.

Scope

This proposal covers:

SIMD operations (arithmetic, comparisons, reductions)
Bit manipulation operations
CPU feature detection

Deferred to separate proposal:

Atomic operations (require integration with memory model)
Memory operations (prefetch, memory_fence)

Capability Definition

Intrinsics is a capability trait providing low-level hardware operations:

trait Intrinsics {
    // SIMD float operations (128-bit / 4-wide)
    @simd_add_f32x4 (a: [float, max 4], b: [float, max 4]) -> [float, max 4]
    @simd_sub_f32x4 (a: [float, max 4], b: [float, max 4]) -> [float, max 4]
    @simd_mul_f32x4 (a: [float, max 4], b: [float, max 4]) -> [float, max 4]
    @simd_div_f32x4 (a: [float, max 4], b: [float, max 4]) -> [float, max 4]
    @simd_min_f32x4 (a: [float, max 4], b: [float, max 4]) -> [float, max 4]
    @simd_max_f32x4 (a: [float, max 4], b: [float, max 4]) -> [float, max 4]
    @simd_sqrt_f32x4 (a: [float, max 4]) -> [float, max 4]
    @simd_abs_f32x4 (a: [float, max 4]) -> [float, max 4]
    @simd_eq_f32x4 (a: [float, max 4], b: [float, max 4]) -> [bool, max 4]
    @simd_lt_f32x4 (a: [float, max 4], b: [float, max 4]) -> [bool, max 4]
    @simd_gt_f32x4 (a: [float, max 4], b: [float, max 4]) -> [bool, max 4]
    @simd_sum_f32x4 (a: [float, max 4]) -> float  // Horizontal sum

    // SIMD float operations (256-bit / 8-wide, AVX)
    @simd_add_f32x8 (a: [float, max 8], b: [float, max 8]) -> [float, max 8]
    @simd_sub_f32x8 (a: [float, max 8], b: [float, max 8]) -> [float, max 8]
    @simd_mul_f32x8 (a: [float, max 8], b: [float, max 8]) -> [float, max 8]
    @simd_div_f32x8 (a: [float, max 8], b: [float, max 8]) -> [float, max 8]
    @simd_min_f32x8 (a: [float, max 8], b: [float, max 8]) -> [float, max 8]
    @simd_max_f32x8 (a: [float, max 8], b: [float, max 8]) -> [float, max 8]
    @simd_sqrt_f32x8 (a: [float, max 8]) -> [float, max 8]
    @simd_abs_f32x8 (a: [float, max 8]) -> [float, max 8]
    @simd_eq_f32x8 (a: [float, max 8], b: [float, max 8]) -> [bool, max 8]
    @simd_lt_f32x8 (a: [float, max 8], b: [float, max 8]) -> [bool, max 8]
    @simd_gt_f32x8 (a: [float, max 8], b: [float, max 8]) -> [bool, max 8]
    @simd_sum_f32x8 (a: [float, max 8]) -> float

    // SIMD float operations (512-bit / 16-wide, AVX-512)
    @simd_add_f32x16 (a: [float, max 16], b: [float, max 16]) -> [float, max 16]
    @simd_sub_f32x16 (a: [float, max 16], b: [float, max 16]) -> [float, max 16]
    @simd_mul_f32x16 (a: [float, max 16], b: [float, max 16]) -> [float, max 16]
    @simd_div_f32x16 (a: [float, max 16], b: [float, max 16]) -> [float, max 16]
    @simd_min_f32x16 (a: [float, max 16], b: [float, max 16]) -> [float, max 16]
    @simd_max_f32x16 (a: [float, max 16], b: [float, max 16]) -> [float, max 16]
    @simd_sqrt_f32x16 (a: [float, max 16]) -> [float, max 16]
    @simd_abs_f32x16 (a: [float, max 16]) -> [float, max 16]
    @simd_eq_f32x16 (a: [float, max 16], b: [float, max 16]) -> [bool, max 16]
    @simd_lt_f32x16 (a: [float, max 16], b: [float, max 16]) -> [bool, max 16]
    @simd_gt_f32x16 (a: [float, max 16], b: [float, max 16]) -> [bool, max 16]
    @simd_sum_f32x16 (a: [float, max 16]) -> float

    // SIMD 64-bit integer operations (128-bit / 2-wide)
    @simd_add_i64x2 (a: [int, max 2], b: [int, max 2]) -> [int, max 2]
    @simd_sub_i64x2 (a: [int, max 2], b: [int, max 2]) -> [int, max 2]
    @simd_mul_i64x2 (a: [int, max 2], b: [int, max 2]) -> [int, max 2]
    @simd_min_i64x2 (a: [int, max 2], b: [int, max 2]) -> [int, max 2]
    @simd_max_i64x2 (a: [int, max 2], b: [int, max 2]) -> [int, max 2]
    @simd_eq_i64x2 (a: [int, max 2], b: [int, max 2]) -> [bool, max 2]
    @simd_lt_i64x2 (a: [int, max 2], b: [int, max 2]) -> [bool, max 2]
    @simd_gt_i64x2 (a: [int, max 2], b: [int, max 2]) -> [bool, max 2]
    @simd_sum_i64x2 (a: [int, max 2]) -> int

    // SIMD 64-bit integer operations (256-bit / 4-wide, AVX2)
    @simd_add_i64x4 (a: [int, max 4], b: [int, max 4]) -> [int, max 4]
    @simd_sub_i64x4 (a: [int, max 4], b: [int, max 4]) -> [int, max 4]
    @simd_mul_i64x4 (a: [int, max 4], b: [int, max 4]) -> [int, max 4]
    @simd_min_i64x4 (a: [int, max 4], b: [int, max 4]) -> [int, max 4]
    @simd_max_i64x4 (a: [int, max 4], b: [int, max 4]) -> [int, max 4]
    @simd_eq_i64x4 (a: [int, max 4], b: [int, max 4]) -> [bool, max 4]
    @simd_lt_i64x4 (a: [int, max 4], b: [int, max 4]) -> [bool, max 4]
    @simd_gt_i64x4 (a: [int, max 4], b: [int, max 4]) -> [bool, max 4]
    @simd_sum_i64x4 (a: [int, max 4]) -> int

    // Bit operations
    @count_leading_zeros (value: int) -> int
    @count_trailing_zeros (value: int) -> int
    @count_ones (value: int) -> int
    @rotate_left (value: int, amount: int) -> int
    @rotate_right (value: int, amount: int) -> int

    // Hardware queries
    @cpu_has_feature (feature: str) -> bool
}

Usage

Function Declaration

@fast_dot_product (a: [float], b: [float]) -> float uses Intrinsics =
    // Use SIMD intrinsics for vectorized computation
    ...

Capability Provision

The default def impl Intrinsics uses native instructions when available, falling back to scalar emulation when not:

// Uses default (NativeWithFallback)
@compute () -> float uses Intrinsics =
    Intrinsics.simd_add_f32x4(a: vec1, b: vec2)

Override for testing or explicit control:

with Intrinsics = EmulatedIntrinsics {} in
    fast_operation()  // Always uses scalar fallback

Conditional Use with Feature Detection

@dot_product (a: [float], b: [float]) -> float uses Intrinsics =
    if Intrinsics.cpu_has_feature(feature: "avx2") then
        avx2_dot_product(a, b)
    else if Intrinsics.cpu_has_feature(feature: "sse4.1") then
        sse4_dot_product(a, b)
    else
        scalar_dot_product(a, b)

Compile-Time Platform Targeting

#target(arch: "x86_64")
@fast_checksum (data: [byte]) -> int uses Intrinsics =
    Intrinsics.crc32(data: data)

#target(not_arch: "x86_64")
@fast_checksum (data: [byte]) -> int =
    data.fold(initial: 0, op: (acc, b) -> acc ^ (b as int))

SIMD Operations

Vector Types

SIMD operations work on fixed-capacity lists:

[float, max 4] — 128-bit (SSE, NEON)
[float, max 8] — 256-bit (AVX, AVX2)
[float, max 16] — 512-bit (AVX-512)
[int, max 2] — 128-bit i64 (SSE2)
[int, max 4] — 256-bit i64 (AVX2)

The int type is 64-bit in Ori, so integer SIMD uses i64 lanes.

Core Operations

Category	Float Operations	Int Operations
Arithmetic	`add`, `sub`, `mul`, `div`	`add`, `sub`, `mul`
Comparison	`eq`, `lt`, `gt`	`eq`, `lt`, `gt`
Min/Max	`min`, `max`	`min`, `max`
Math	`sqrt`, `abs`	—
Reduction	`sum`	`sum`

Example: SIMD Dot Product

@simd_dot_4 (a: [float, max 4], b: [float, max 4]) -> float uses Intrinsics = {
    let products = Intrinsics.simd_mul_f32x4(a: a, b: b)
    Intrinsics.simd_sum_f32x4(a: products)
}

Platform Availability

Target	128-bit (x4)	256-bit (x8)	512-bit (x16)
x86_64	SSE (baseline)	AVX/AVX2	AVX-512
aarch64	NEON	—	—
wasm32	SIMD128	—	—

Bit Manipulation

Operations

// Number of set bits (population count)
@count_ones (value: int) -> int uses Intrinsics

// Number of leading zero bits
@count_leading_zeros (value: int) -> int uses Intrinsics

// Number of trailing zero bits
@count_trailing_zeros (value: int) -> int uses Intrinsics

// Bitwise rotation
@rotate_left (value: int, amount: int) -> int uses Intrinsics
@rotate_right (value: int, amount: int) -> int uses Intrinsics

Example

@is_power_of_two (n: int) -> bool uses Intrinsics =
    n > 0 && Intrinsics.count_ones(value: n) == 1

Hardware Feature Detection

Runtime Detection

@cpu_has_feature (feature: str) -> bool uses Intrinsics

Valid feature strings:

Platform	Features
x86_64	`"sse"`, `"sse2"`, `"sse3"`, `"sse4.1"`, `"sse4.2"`, `"avx"`, `"avx2"`, `"avx512f"`
aarch64	`"neon"`
wasm32	`"simd128"`

Unknown feature strings cause a panic.

Example

@optimized_compute (data: [float]) -> float uses Intrinsics =
    if Intrinsics.cpu_has_feature(feature: "avx2") then
        avx2_compute(data)
    else if Intrinsics.cpu_has_feature(feature: "sse4.1") then
        sse4_compute(data)
    else
        scalar_compute(data)

Platform-Specific Behavior

Auto-Fallback (Default)

The default def impl Intrinsics provides NativeWithFallback:

Uses native SIMD instructions when the operation is supported on the current platform
Falls back to scalar emulation when not supported
Always works, but may be slower on emulated paths

// This always works, even on platforms without AVX
Intrinsics.simd_add_f32x8(a, b)  // Uses AVX if available, emulates otherwise

Explicit Control

For performance-critical code, use feature detection to select optimal paths:

@fast_path (data: [float]) -> float uses Intrinsics =
    if Intrinsics.cpu_has_feature(feature: "avx2") then
        // Known to use native AVX2
        avx2_implementation(data)
    else
        // Known to use scalar
        scalar_implementation(data)

Implementation Providers

Provider	Behavior
`NativeWithFallback`	Native when available, scalar fallback (default)
`EmulatedIntrinsics`	Always uses scalar operations (for testing)

Safety Guarantees

No Undefined Behavior

Unlike C intrinsics, Ori intrinsics:

Check input sizes at runtime
Panic on invalid inputs (not UB)
Don’t allow arbitrary memory access

SIMD Safety

SIMD operations require correctly sized inputs:

Intrinsics.simd_add_f32x4(a: [1.0, 2.0], b: [...])  // panic: expected 4 elements

Bit Operation Safety

Rotation amounts are taken modulo 64:

Intrinsics.rotate_left(value: 1, amount: 65)  // Same as amount: 1

Error Messages

Missing Capability

error[E1060]: `simd_add_f32x4` requires `Intrinsics` capability
  --> src/main.ori:5:5
   |
 5 |     Intrinsics.simd_add_f32x4(a, b)
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ requires `uses Intrinsics`
   |
   = help: add `uses Intrinsics` to function signature

Unknown Feature

error[E1062]: unknown CPU feature `"avx3"`
  --> src/main.ori:5:45
   |
 5 |     if Intrinsics.cpu_has_feature(feature: "avx3") then
   |                                            ^^^^^^ unknown feature
   |
   = note: valid features for x86_64: "sse", "sse2", "sse3", "sse4.1", "sse4.2", "avx", "avx2", "avx512f"

Wrong Vector Size

error[E1063]: SIMD operation requires exactly 4 elements
  --> src/main.ori:5:5
   |
 5 |     Intrinsics.simd_add_f32x4(a: [1.0, 2.0], b: b)
   |                                  ^^^^^^^^^^ has 2 elements

Spec Changes Required

Update `14-capabilities.md`

Add Intrinsics to standard capabilities table:

| Capability | Purpose | Suspends |
|------------|---------|----------|
| `Intrinsics` | Low-level SIMD and bit operations | No |

Add section describing available operations and platform behavior.

Implementation

Phase 6.14: Intrinsics Capability

Add Intrinsics trait to prelude
Implement def impl Intrinsics with NativeWithFallback
Add EmulatedIntrinsics provider
Implement SIMD codegen for LLVM backend
Add feature detection for x86_64, aarch64, wasm32
Add comprehensive tests

LLVM Backend

SIMD operations map to LLVM vector intrinsics:

simd_add_f32x4 → fadd <4 x float>
count_ones → llvm.ctpop.i64
cpu_has_feature → Runtime CPUID check

Summary

Category	Operations
SIMD Float	`add`, `sub`, `mul`, `div`, `min`, `max`, `sqrt`, `abs`, `eq`, `lt`, `gt`, `sum`
SIMD Int	`add`, `sub`, `mul`, `min`, `max`, `eq`, `lt`, `gt`, `sum`
Widths	4-wide (128-bit), 8-wide (256-bit), 16-wide (512-bit)
Bit	`count_ones`, `count_leading_zeros`, `count_trailing_zeros`, `rotate_left`, `rotate_right`
Query	`cpu_has_feature`

Aspect	Behavior
Safety	Bounds-checked, panic on invalid
Fallback	Auto-emulation (default), explicit via EmulatedIntrinsics
Detection	`cpu_has_feature` for runtime, `#target` for compile-time
Provider	Capability trait with `def impl`

Design Decisions

Atomics deferred — Atomic operations require integration with Ori’s memory model and proper pointer types. A separate proposal will address these.
Auto-fallback default — The default def impl uses native instructions when available and emulates otherwise, ensuring code always works across platforms.
64-bit integers only — Integer SIMD uses Ori’s native int (i64) to avoid truncation complexity.
String-based feature detection — Simple cpu_has_feature("avx2") pattern with documented valid strings and panic on unknown features.
Core operation set — Includes arithmetic, comparisons, min/max, math (sqrt/abs), and horizontal sum. More exotic operations (shuffle, blend, FMA) can be added in future proposals.

Errata (added 2026-03-05)

Superseded by intrinsics-v2-byte-simd-proposal: This proposal’s explicit-width naming scheme (simd_add_f32x4, simd_add_i64x2, etc.) is replaced by a generic API (simd_add<T, $N>). Key changes:

Generic API: All operations are generic over lane type T and width $N. The compiler monomorphizes based on the fixed-capacity list type at the call site.

Float lane width fix: f32 in names was incorrect — Ori’s float is f64. [float, max 2] = 128-bit (not [float, max 4] as this proposal stated).

Mask<$N> type: Comparison operations now return Mask<$N> instead of [bool, max N].

Byte SIMD added: 12 byte-level operations at 128/256/512-bit widths.

std.bytes module: High-level byte search functions backed by SIMD.

V1 names deprecated: Explicit-width names remain as aliases but emit deprecation warnings.

Proposal: Intrinsics Capability

Summary

Scope

Capability Definition

Usage

Function Declaration

Capability Provision

Conditional Use with Feature Detection

Compile-Time Platform Targeting

SIMD Operations

Vector Types

Core Operations

Example: SIMD Dot Product

Platform Availability

Bit Manipulation

Operations

Example

Hardware Feature Detection

Runtime Detection

Example

Platform-Specific Behavior

Auto-Fallback (Default)

Explicit Control

Implementation Providers

Safety Guarantees

No Undefined Behavior

SIMD Safety

Bit Operation Safety

Error Messages

Missing Capability

Unknown Feature

Wrong Vector Size

Spec Changes Required

Update 14-capabilities.md

Implementation

Phase 6.14: Intrinsics Capability

LLVM Backend

Summary

Design Decisions

Errata (added 2026-03-05)

Update `14-capabilities.md`