The #[arcane] Macro

#[arcane] creates a safe wrapper around SIMD code. Use it at entry points—functions called from non-SIMD code (after summon(), from tests, public APIs).

For internal helpers called from other SIMD functions, use #[rite] instead—it has zero wrapper overhead.

Basic Usage

#![allow(unused)]
fn main() {
use archmage::prelude::*;

#[arcane]
fn add_vectors(_token: Desktop64, a: &[f32; 8], b: &[f32; 8]) -> [f32; 8] {
    // safe_unaligned_simd takes references - fully safe inside #[arcane]!
    let va = _mm256_loadu_ps(a);
    let vb = _mm256_loadu_ps(b);
    let sum = _mm256_add_ps(va, vb);

    let mut out = [0.0f32; 8];
    _mm256_storeu_ps(&mut out, sum);
    out
}
}

What It Generates

#![allow(unused)]
fn main() {
// Your code:
#[arcane]
fn add(token: Desktop64, a: __m256, b: __m256) -> __m256 {
    _mm256_add_ps(a, b)
}

// Generated:
fn add(token: Desktop64, a: __m256, b: __m256) -> __m256 {
    #[target_feature(enable = "avx2,fma,bmi1,bmi2")]
    #[inline]
    unsafe fn __inner(token: Desktop64, a: __m256, b: __m256) -> __m256 {
        _mm256_add_ps(a, b)  // Safe inside #[target_feature]!
    }
    // SAFETY: Token proves CPU support was verified
    unsafe { __inner(token, a, b) }
}
}

Token-to-Features Mapping

Token	Enabled Features
`X64V2Token`	sse3, ssse3, sse4.1, sse4.2, popcnt
`X64V3Token` / `Desktop64`	+ avx, avx2, fma, bmi1, bmi2, f16c
`X64V4Token` / `Server64`	+ avx512f, avx512bw, avx512cd, avx512dq, avx512vl
`NeonToken` / `Arm64`	neon
`Simd128Token`	simd128

Nesting #[arcane] Functions

Functions with the same token type inline into each other:

#![allow(unused)]
fn main() {
use magetypes::simd::f32x8;

#[arcane]
fn outer(token: Desktop64, data: &[f32; 8]) -> f32 {
    let sum = inner(token, data);  // Inlines!
    sum * 2.0
}

#[arcane]
fn inner(token: Desktop64, data: &[f32; 8]) -> f32 {
    // Both functions share the same #[target_feature] region
    // LLVM optimizes across both
    let v = f32x8::from_array(token, *data);
    v.reduce_add()
}
}

Downcasting Tokens

Higher tokens can call functions expecting lower tokens:

#![allow(unused)]
fn main() {
use magetypes::simd::f32x8;

#[arcane]
fn v4_kernel(token: X64V4Token, data: &[f32; 8]) -> f32 {
    // V4 ⊃ V3, so this works and inlines properly
    v3_sum(token, data)
    // ... could do AVX-512 specific work too ...
}

#[arcane]
fn v3_sum(token: X64V3Token, data: &[f32; 8]) -> f32 {
    // Actual SIMD: load 8 floats, horizontal sum
    let v = f32x8::from_array(token, *data);
    v.reduce_add()
}
}

Cross-Platform Stubs

On non-matching architectures, #[arcane] generates a stub:

#![allow(unused)]
fn main() {
// On ARM, this becomes:
#[cfg(not(target_arch = "x86_64"))]
fn add(token: Desktop64, a: &[f32; 8], b: &[f32; 8]) -> [f32; 8] {
    unreachable!("Desktop64 cannot exist on this architecture")
}
}

The stub compiles but can never be reached—Desktop64::summon() returns None on ARM.

Options

`inline_always`

Force aggressive inlining (requires nightly):

#![allow(unused)]
#![feature(target_feature_inline_always)]

fn main() {
#[arcane(inline_always)]
fn hot_path(token: Desktop64, data: &[f32]) -> f32 {
    // Uses #[inline(always)] instead of #[inline]
}
}

Common Patterns

Public API with Internal Implementation

#![allow(unused)]
fn main() {
pub fn process(data: &mut [f32]) {
    if let Some(token) = Desktop64::summon() {
        process_simd(token, data);
    } else {
        process_scalar(data);
    }
}

#[arcane]
fn process_simd(token: Desktop64, data: &mut [f32]) {
    // SIMD implementation
}

fn process_scalar(data: &mut [f32]) {
    // Fallback
}
}

Generic Over Tokens

#![allow(unused)]
fn main() {
use archmage::HasX64V2;
use std::arch::x86_64::*;

#[arcane]
fn generic_impl<T: HasX64V2>(token: T, a: __m128, b: __m128) -> __m128 {
    // Works with X64V2Token, X64V3Token, X64V4Token
    // Note: generic bounds create optimization boundaries
    _mm_add_ps(a, b)
}
}

Warning: Generic bounds prevent inlining across the boundary. Prefer concrete tokens for hot paths.

Archmage & Magetypes