An Effective DRAM Cache Architecture for Scale-Out Servers

MSR-TR-2016-20 |

Scale-out workloads are characterized by in-memory datasets, and consequently massive memory footprints. Due to the abundance of request-level parallelism found in these workloads, recent research advocates for manycore architectures to maximize throughput while maintaining quality of service. On-die stacked DRAM caches have been proposed to provide the required bandwidth for manycore servers through caching of secondary data working sets. However, the disparity between provided capacity and hot dataset working set sizes — resulting from power-law dataset access distributions — precludes their effective deployment in servers, calling for high-capacity cache architectures.

In this work, we find that while emerging high-bandwidth memory technology falls short of providing enough capacity to serve as system memory, it is a great substrate for high-capacity caches. We also find the long cache residency periods enabled by high-capacity caches uncover significant spatial locality across objects. Based on our findings, we introduce Scale-Out Cache (soCache) — a distributed cache composed of multiple high-bandwidth memory modules. Each soCache module uses a page-based organization that optimizes for spatial locality while minimizing tag storage requirements. By storing the tags in the logic die (in SRAM) of the high-bandwidth memory modules, soCache avoids the prohibitive complexity of in-DRAM metadata in state-of-the-art DRAM caches. In 14nm technology, soCache reduces system energy by 1.4-4.4x and improves throughput by 28-44% over state-of-the-art memory systems.