Memory
Load, store, gather, scatter, and data layout patterns
Moving data between memory and SIMD registers efficiently. The difference between fast and slow SIMD code is usually in the memory access patterns, not the arithmetic.
- Load & Store — Unaligned, aligned, partial, streaming
- Gather & Scatter — Non-contiguous access and prefetch hints
- Interleaved Data —
deinterleave_4ch,interleave_4chfor RGBA and similar - Chunked Processing — Processing large arrays in SIMD-sized chunks, alignment, performance