As with all my handwritten notes, this has the usual disclaimer: these posts are just so I can use nice indexed search to find my notes, which are sufficient for me to recall talks and papers but probably not much use to anyone else. Paper here.
Prior research indicates that there is much spatial variation in applications’ memory access patterns. Modern memory systems, however, use small fixed-size cache blocks and as such cannot exploit the variation. Increasing the block size would not only prohibitively increase pin and interconnect bandwidth demands, but also increase the likelihood of false sharing in shared-memory multiprocessors. In this paper, we show that memory accesses in commercial workloads often exhibit repetitive layouts that span large memory regions (e.g., several kB), and these accesses recur in patterns that are predictable through codebased correlation. We propose Spatial Memory Streaming, a practical on-chip hardware technique that identifies codecorrelated spatial access patterns and streams predicted blocks to the primary cache ahead of demand misses. Using cycle-accurate full-system multiprocessor simulation of commercial and scientific applications, we demonstrate that Spatial Memory Streaming can on average predict 58% of L1 and 65% of off-chip misses, for a mean performance improvement of 37% and at best 307%.