Unexpected Cache Performance

LWN, an excellent weekly publication about Linux and the OSS world in general started running a series of articles about the subtleties of memory from the view of a programmer.

What every programmer should know about memory, Part 1

Memory part 2: CPU caches

The analysis of cache performance in multithreaded code surprised me.  In a multicore CPU with shared L2 cache, threads writing to the same data will actually perform terribly if the working set fits withing the L1 d-cache.  Figure 3.27: Core 2 Bandwidth with 2 Threads shows this very nicely.  Performance improves again in the L2 where cache line locking isn’t an issue anymore.  Figure 3.29: AMD Fam 10h Bandwidth with 2 Threads shows that this isn’t limited to the Core 2 either.  L2 isn’t shared so the L3 is the sweet spot for multithreaded performance.

Unexpected Cache Performance