Reductions

Reductions collapse all lanes of a vector into a single scalar value.

Available Reductions

use magetypes::simd::{generic::f32x8, backends::F32x8Backend};

#[inline(always)]
fn reduction_example<T: F32x8Backend>(token: T) {
    let v = f32x8::<T>::from_array(token, [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]);

    let sum = v.reduce_add();  // 36.0 — sum of all lanes
    let max = v.reduce_max();  // 8.0  — maximum lane value
    let min = v.reduce_min();  // 1.0  — minimum lane value
}

These work on float types (f32x4<T>, f32x8<T>, f64x2<T>, etc.) and integer types (i32x4<T>, i32x8<T>, u32x4<T>, etc.).

Boolean Reductions

For integer mask types, check whether any or all lanes are set:

let mask = a.simd_lt(b);

let any = mask.any_true();  // true if any lane comparison was true
let all = mask.all_true();  // true if every lane comparison was true
let bits = mask.bitmask();  // Bit pattern of which lanes are true

bitmask() returns an integer where bit N corresponds to lane N. On an 8-lane vector, bitmask() returns a u8.

Example: Find Maximum Value in a Large Array

use archmage::{X64V3Token, SimdToken, arcane};
use magetypes::simd::{generic::f32x8, backends::F32x8Backend};

#[arcane]
fn find_max(token: X64V3Token, data: &[f32]) -> f32 {
    let chunks = data.chunks_exact(8);
    let remainder = chunks.remainder();

    // SIMD reduction over full chunks
    let mut max_v = f32x8::<X64V3Token>::splat(token, f32::NEG_INFINITY);
    for chunk in chunks {
        let v = f32x8::<X64V3Token>::from_slice(token, chunk);
        max_v = max_v.max(v);
    }

    // Reduce vector to scalar
    let mut result = max_v.reduce_max();

    // Handle remainder
    for &x in remainder {
        if x > result {
            result = x;
        }
    }

    result
}

This processes 8 floats per iteration in the SIMD loop, then handles any leftover elements with scalar code.

Found an error or it needs a clarification? Open an issue on GitHub.
Substantiated corrections will be incorporated with attribution.