Memory

Load, store, gather, scatter, and data layout patterns

Moving data between memory and SIMD registers efficiently. The difference between fast and slow SIMD code is usually in the memory access patterns, not the arithmetic.

Load & Store — Unaligned, aligned, partial, streaming
Gather & Scatter — Non-contiguous access and prefetch hints
Interleaved Data — deinterleave_4ch, interleave_4ch for RGBA and similar
Chunked Processing — Processing large arrays in SIMD-sized chunks, alignment, performance

Edit this page