Memory

Load, store, gather, scatter, and data layout patterns

Moving data between memory and SIMD registers efficiently. The difference between fast and slow SIMD code is usually in the memory access patterns, not the arithmetic.

  1. Load & Store — Unaligned, aligned, partial, streaming
  2. Gather & Scatter — Non-contiguous access and prefetch hints
  3. Interleaved Datadeinterleave_4ch, interleave_4ch for RGBA and similar
  4. Chunked Processing — Processing large arrays in SIMD-sized chunks, alignment, performance