Crystal: A Unified Cache Storage System for Analytical Databases

Proceedings of the VLDB Endowment |

Cloud analytical databases employ a disaggregated storage model, where the elastic compute layer accesses data persisted on remote cloud storage in block-oriented columnar formats. Given the high latency and low bandwidth to remote storage and the limited size of
fast local storage, caching data at the compute node is important and has resulted in a renewed interest in caching for analytics. Today,
each DBMS builds its own caching solution, usually based on file or block-level LRU. In this paper, we advocate a new architecture of
a smart cache storage system called Crystal, that is co-located with compute. Crystal’s clients are DBMS-specific “data sources” with
push-down predicates. Similar in spirit to a DBMS, Crystal incorporates query processing and optimization components focusing on
efficient caching and serving of single-table hyper-rectangles called regions. Results show that Crystal, with a small DBMS-specific data source connector, can significantly improve query latencies on unmodified Spark and Greenplum while also saving on bandwidth from remote storage.