SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores

LSM-based KV stores are designed to offer good write performance by capturing client writes in memory and only later flushing them to storage. Writes are later compacted into a tree-like data structure on disk to improve read performance and to reduce storage space use. It has been widely documented that compactions severely hamper throughput. Various optimizations have successfully dealt with this problem. These techniques include, among others, rate-limiting flushes and compactions, selecting among compactions for maximum effect, and limiting compactions to the highest level by so-called fragmented LSMs.

In this work we focus on latencies rather than throughput. The root cause of high tail latencies is interference between client writes, flushes and compactions. We introduce the notion of an I/O sched- uler for an LSM-based KV store to reduce this interference. We explore three techniques as part of this I/O scheduler: 1) opportunistically allocating more bandwidth to internal op- erations during periods of low load, 2) prioritizing flushes and compactions at the lower levels of the tree, and 3) preempting compactions.

SILK is a new open-source KV store that incorporates this notion of an I/O scheduler. SILK is derived from RocksDB, but the concepts can be applied to other LSM-based KV stores. We demonstrate that SILK achieves up to two orders of magnitude lower 99th percentile latencies than RocksDB and TRIAD, without any significant negative effects on other performance metrics.

[SLIDES]

Speaker Details

Oana Balmau is a PhD student at the University of Sydney, advised by Prof. Willy Zwaenepoel. She completed her Bachelor and Master studies in Computer Science at EPFL, Switzerland. Her research focuses on Distributed Systems, Storage and Concurrency, with an emphasis on optimizations for key-value stores.

Date:
Speakers:
Oana Balmau
Affiliation:
University of Sydney

Series: Microsoft Research Talks