Behavioral Differences

Magetypes maintains full API parity across x86-64, AArch64, and WASM — the same methods exist everywhere. But a few operations have different semantics between architectures. These are not bugs; they reflect real hardware differences.

Bitwise Operators

	x86-64	AArch64	WASM
`a & b`, `a \| b`, `a ^ b`, `!a`	Trait impls (operators work)	Methods only	Methods only
`a.and(b)`, `a.or(b)`, `a.xor(b)`, `a.not()`	Works	Works	Works

Portable choice: Always use .and(), .or(), .xor(), .not() methods. They work on all platforms.

use magetypes::simd::{
    generic::f32x8,
    backends::F32x8Backend,
};

#[inline(always)]
fn bitwise_example<T: F32x8Backend>(a: f32x8<T>, b: f32x8<T>) {
    // Portable — works on all platforms
    let result = a.and(b);

    // x86-64 only — won't compile on ARM/WASM
    // let result = a & b;
}

Shift Right for Signed Integers

	x86-64	AArch64	WASM
`shr` on signed types (i8, i16, i32)	Logical (zero-fill)	Arithmetic (sign-extend)	Arithmetic (sign-extend)
`shr_arithmetic`	Sign-extending	Sign-extending	Sign-extending

Portable choice: Use shr_arithmetic when you want sign-extending behavior. Use shr when you want zero-filling behavior (and be aware it sign-extends on ARM/WASM).

use magetypes::simd::{
    generic::i32x4,
    backends::I32x4Backend,
};

#[inline(always)]
fn shift_example<T: I32x4Backend>(token: T) {
    let v = i32x4::<T>::splat(token, -8);

    // shr: behavior differs by platform
    let shifted = v.shr::<1>();
    // x86-64:  [-8 >> 1] with zero-fill = some large positive number
    // ARM/WASM: [-8 >> 1] with sign-extend = -4

    // shr_arithmetic: consistent everywhere
    let shifted = v.shr_arithmetic::<1>();
    // All platforms: -4
}

This difference exists because x86-64's SSE/AVX shift instructions are logical for all types, while ARM NEON and WASM use arithmetic shifts for signed types.

Blend Signature

	x86-64 / AArch64	WASM
`blend(mask, true_val, false_val)`	`blend(true_val, false_val)` on mask	`self.blend(other, mask)` on value

The method exists on all platforms but the calling convention differs. For portable code that uses blend, test on all target platforms.

interleave_lo / interleave_hi

Available on f32x4 across all platforms. Not available on integer types.

use magetypes::simd::{
    generic::f32x4,
    backends::F32x4Backend,
};

#[inline(always)]
fn interleave_example<T: F32x4Backend>(a: f32x4<T>, b: f32x4<T>) {
    // Works everywhere
    let lo = a.interleave_lo(b);
    let hi = a.interleave_hi(b);
}

Floating-Point Behavioral Differences

These differences arise from how hardware implements IEEE 754 operations. They affect specific edge cases (NaN, signed zero, near-zero cancellation) but not normal arithmetic.

Negation and Signed Zero

	x86-64	AArch64	WASM
`neg(0.0)`	`+0.0` (uses `sub(0, x)`)	`-0.0` (uses `vneg`)	`-0.0` (uses `f32x4_neg`)

x86 implements negation as 0 - x, which produces +0.0 for zero inputs (IEEE 754: +0 - +0 = +0). ARM and WASM flip the sign bit directly, preserving -0.0. If the sign of zero matters for your algorithm, use bitwise XOR with a sign mask instead of the - operator.

Min/Max NaN Propagation

	x86-64	AArch64 / WASM / Scalar
`min(NaN, x)`	Returns `x` (second operand)	Returns `x` (non-NaN value)
`min(x, NaN)`	Returns `NaN`	Returns `x` (non-NaN value)

SSE minps/maxps always returns the second operand when the first is NaN, and the first when the second is NaN — it doesn't distinguish "propagate NaN" from "return non-NaN." ARM and WASM (and scalar f32::min) always return the non-NaN value regardless of operand order. If your inputs may contain NaN, filter them first or use comparison + blend for consistent behavior.

FMA vs Separate Multiply-Add

	x86-64 (AVX2+) / AArch64	WASM / Scalar fallback
`mul_add(a, b, c)`	Fused multiply-add (one rounding)	`a * b + c` (two roundings)

Hardware FMA computes a × b + c with a single rounding step, while the scalar/WASM fallback rounds the intermediate product before adding c. This matters most when a × b nearly cancels with c — the results can differ by many ULPs near zero. For most inputs the difference is sub-ULP. Accept small differences in dispatch parity tests.

Comparison NaN Semantics (simd_ne)

	x86-64	AArch64 / WASM
`simd_ne(NaN, x)`	True (unordered: NaN ≠ anything)	May vary by implementation

The simd_ne operation uses the hardware's not-equal comparison, which may be "ordered" or "unordered" depending on platform. For portable NaN-aware inequality, use simd_eq + not.

Reduction Associativity (reduce_add)

All backends compute reduce_add using tree reduction, but the exact grouping may differ between scalar (left-fold) and hardware (pairwise tree). For inputs with large magnitude differences, floating-point associativity causes small relative errors (~1e-6). This is inherent to IEEE 754 and not a bug.

Rounding Consistency (Fixed in 0.9.16)

As of version 0.9.16, round(), floor(), ceil(), and to_i32_round() produce identical results across all backends including the scalar fallback. Previously, the scalar backend used ties-away-from-zero for rounding while all hardware used ties-to-even (IEEE 754 default). This was fixed by implementing roundevenf in the scalar math library.

Summary: Safe Portable Patterns

Operation	Portable Method
Bitwise AND	`.and()`
Bitwise OR	`.or()`
Bitwise XOR	`.xor()`
Bitwise NOT	`.not()`
Sign-extending right shift	`.shr_arithmetic::<N>()`
Arithmetic, comparisons, rounding	All operators and methods (bit-exact across backends)
Reductions	All methods (tiny FP associativity differences possible)
FMA (`mul_add`/`mul_sub`)	All methods (±1 ULP difference scalar vs hardware)
Transcendentals	All methods (tolerance-based parity)

The vast majority of magetypes operations are fully portable with identical semantics. The floating-point edge cases listed above affect only NaN, signed zero, and near-zero cancellation scenarios.

Found an error or it needs a clarification? Open an issue on GitHub.

Substantiated corrections will be incorporated with attribution.

Found a typo? Fork, modify and open a PR.