Fat Caches for Scale-Out Servers

IEEE Micro | , Vol 37(2): pp. 90-103

Emerging scale-out servers are characterized by massive memory footprints and bandwidth requirements. On-chip stacked DRAM caches have been proposed to provide the required bandwidth for manycore servers through caching of secondary data working sets. However, the disparity between provided capacity and working set sizes precludes their effective deployment in servers, calling for high-capacity cache architectures. High-capacity caches–enabled by the emergence of high-bandwidth memory technologies–exhibit high spatiotemporal locality due to coarse-grained access patterns and long cache residency periods stemming from skewed dataset access distributions. The observed spatiotemporal behavior favors a page-based organization that naturally exploits spatial locality while minimizing tag storage requirements and enabling a practical in-SRAM tag array architecture. By storing tags in SRAM, caches avoid the complexity of in-DRAM metadata found in state-of-the-art DRAM caches.