Daniel S. Berger

Senior Researcher

Caching with Delayed Hits [SICOMM’20]

We show that traditional caching strategies – even so called ‘optimal’ algorithms – can fail to minimize latency in the presence of delayed hits. We design a new, latency-optimal offline caching algorithm called Belatedly which reduces average latencies by up to 45% compared to the traditional, hit-rate optimal Belady’s algorithm.

Learning Relaxed Belady for Content Distribution Network Caching [NSDI’20]

This project presents a new approach for caching in CDNs that
uses machine learning to approximate the Belady MIN (oracle) algorithm.
To accomplish this complex task, we propose
a CDN cache design called Learning Relaxed Belady (LRB)
to mimic a Relaxed Belady algorithm, using the concept of
Belady boundary. We also propose a metric called good decision ratio to help us make better design decisions. In addition,
the paper addresses several challenges to build an end-to-end
machine learning caching prototype, including how to gather
training data, limit memory overhead, and have lightweight
training and prediction.

We have implemented an LRB simulator and a prototype
within Apache Traffic Server. Our simulation results with 6
production CDN traces show that LRB reduces WAN traffic
compared to a typical production CDN cache design by 4–
25%, and consistently outperform other state-of-the-art methods. Our evaluation of the LRB prototype shows its overhead
is modest and it can be deployed on today’s CDN servers.

Code: on github (opens in new tab)
Data: two large Wikipedia CDN request traces publicly available (from a CDN server in San Francisco)
- Wiki 2018 (opens in new tab)
- Wiki 2019 (opens in new tab)

LFO: Towards Lightweight and Robust Machine Learning for CDN Caching [HotNets’18]

Recent advances in the field of reinforcement learning promise
a general approach to optimize networking systems. This paper
argues against the recent trend for generalization by introducing
a case study where domain-specific modeling enables
the application of lightweight and robust learning techniques.

We study CDN caching systems, which make a good case
for optimization as their performance directly affects operational
costs, while currently relying on many hand-tuned parameters.
In caching, reinforcement learning has been shown
to perform suboptimally when compared to simple heuristics.
A key challenge is that rewards (cache hits) manifest with
large delays, which prevents timely feedback to the learning
algorithm and introduces significant complexity.

This paper shows how to significantly simplify this problem
by explicitly modeling optimal caching decisions (OPT).
While prior work considered deriving OPT impractical, recent
theoretical modeling advances change this assumption.
Modeling OPT enables even lightweight decision trees to
outperform state-of-the-art CDN caching heuristics.

Towards Lightweight and Robust Machine Learning for CDN Caching (opens in new tab)
Code: building blocks on github (opens in new tab) (this is a workshop paper and still under heavy development)
Data: publicly available CDN request trace (from a CDN server in San Francisco serving Wikipedia production traffic) used in the evaluation
- cdn1 (opens in new tab) (backup link on s3 (opens in new tab))
- Format: LZMA-compressed (opens in new tab) space-separated table. Three columns: request number, anonymized object-id, object size in bytes

RobinHood: Tail Latency Aware Caching [OSDI’18]

Tail latency is of great importance in user-facing web services.
However, maintaining low tail latency is challenging,
because a single request to a web application server
results in multiple queries to complex, diverse backend
services (databases, recommender systems, ad systems,
etc.). A request is not complete until all of its queries have
completed. We analyze a Microsoft production system
and find that backend query latencies vary by more than
two orders of magnitude across backends and over time,
resulting in high request tail latencies.

We propose a novel solution for maintaining low request
tail latency: repurpose existing caches to mitigate
the effects of backend latency variability, rather than just
caching popular data. Our solution, RobinHood, dynamically
reallocates cache resources from the cache-rich
(backends which don’t affect request tail latency) to the
cache-poor (backends which affect request tail latency).
We evaluate RobinHood with production traces on a 50-
server cluster with 20 different backend systems. Surprisingly,
we find that RobinHood can directly address
tail latency even if working sets are much larger than the
cache size. In the presence of load spikes, RobinHood
meets a 150ms P99 goal 99.7% of the time, whereas the
next best policy meets this goal only 70% of the time.

Flow Offline Optimal (FOO) Analysis Tool [Sigmetrics’18]

Many recent caching systems aim to improve miss ratios, but there is no good sense among practitioners of how much further miss ratios can be improved. In other words, should the systems community continue working on this problem?

Currently, there is no principled answer to this question. In
practice, object sizes often vary by several orders of magnitude,
where computing the optimal miss ratio (OPT) is known to be
NP-hard. The few known results on caching with variable object
sizes provide very weak bounds and are impractical to compute on
traces of realistic length.

We propose a new method to compute upper and lower bounds
on OPT. Our key insight is to represent caching as a min-cost flow
problem, hence we call our method the flow-based offline optimal
(FOO). We prove that, under simple independence assumptions,
FOO’s bounds become tight as the number of objects goes to infinity.
Indeed, FOO’s error over 10M requests of production CDN and
storage traces is negligible: at most 0.3%. FOO thus reveals, for the
first time, the limits of caching with variable object sizes.
While FOO is very accurate, it is computationally impractical on
traces with hundreds of millions of requests. We therefore extend
FOO to obtain more efficient bounds on OPT, which we call practical
flow-based offline optimal (PFOO). We evaluate PFOO on several
full production traces and use it to compare OPT to prior online
policies. This analysis shows that current caching systems are in
fact still far from optimal, suffering 11–43% more cache misses than
OPT, whereas the best prior offline bounds suggest that there is
essentially no room for improvement.

Paper: Practical Bounds on Optimal Caching with Variable Object Sizes, ACM SIGMETRICS 2018 (opens in new tab)
Code: FOO, PFOO, many tools, and prior algorithms (opens in new tab) in C++ on github
Data: popularity distribution, reuse-distance distribution, and object-size distribution for each of the production traces used in the evaluation:
- cdn1 (opens in new tab), cdn2 (opens in new tab), cdn3 (opens in new tab), webapp1 (opens in new tab), webapp2 (opens in new tab), storage1 (opens in new tab), storage2 (opens in new tab), storage3 (opens in new tab)
- Format: LZMA-compressed (opens in new tab) space-separated table. Generated with this C++ script (statistics.cpp) (opens in new tab). Three columns:
  - if first column = “rd”, then second column is log10(reuse distance), and third column counts how often this reuse distance occured
  - if first column = “pop”, then second column is log10(popularity), and third column counts how often this popularity occured
  - if first column = “size” (“uniqsize”), then second column is log10(object size), and third column counts how many requests (how many objects) have this size in bytes
Data: we make request trace from a Wikipedia production server publicly available, this trace is used in the evaluation with the name “cdn1”. The other traces have been used in the prior litearture, e.g., the storage traces are from the SNIO IOTTA repository (opens in new tab).
- cdn1 (opens in new tab) (backup link on s3 (opens in new tab))
- Format: LZMA-compressed (opens in new tab) space-separated table. Three columns: request number, anonymized object-id, object size in bytes

AdaptSize – A Robust High-Performance CDN Caching System [NSDI’17]

Most major content providers use content delivery networks
(CDNs) to serve web and video content to their
users. A CDN is a large distributed system of servers
that caches and delivers content to users. The first-level
cache in a CDN server is the memory-resident Hot Object
Cache (HOC). A major goal of a CDN is to maximize the
object hit ratio (OHR) of its HOCs. But, the small size
of the HOC, the huge variance in the requested object
sizes, and the diversity of request patterns make this goal
challenging.

We propose AdaptSize, the first adaptive, size-aware
cache admission policy for HOCs that achieves a high
OHR, even when object size distributions and request
characteristics vary significantly over time. At the core of
AdaptSize is a novel Markov cache model that seamlessly
adapts the caching parameters to the changing request
patterns. Using request traces from one of the largest
CDNs in the world, we show that our implementation of
AdaptSize achieves significantly higher OHR than widelyused
production systems: 30-48% and 47-91% higher
OHR than Nginx and Varnish, respectively. AdaptSize
also achieves 33-46% higher OHR than state-of-the-art
research systems. Further, AdaptSize is more robust to
changing request patterns than the traditional tuning approach
of hill climbing and shadow queues studied in
other contexts.

Paper: AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN , USENIX NSDI 2017 (opens in new tab)
Code: AdaptSize source code and experimental setup (opens in new tab) in C on github
Code: Simulator for many state-of-the-art caching systems (opens in new tab) in C++ on github
Data: see the CDN traces from our SIGMETRICS paper above

SNCMeister Controller to Meet Tail Latency SLOs in data centers [SOCC’16]

Meeting tail latency Service Level Objectives (SLOs) in shared cloud networks is known to be
an important and challenging problem. The main challenge is determining limits on the multitenancy
such that SLOs are met. This requires calculating latency guarantees, which is a difficult
problem, especially when tenants exhibit bursty behavior as is common in production environments.
Nevertheless, recent papers in the past two years (Silo, QJump, and PriorityMeister) show techniques
for calculating latency based on a branch of mathematical modeling called Deterministic Network
Calculus (DNC). The DNC theory is designed for adversarial worst-case conditions, which is
sometimes necessary, but is often overly conservative. Typical tenants do not require strict worstcase
guarantees, but are only looking for SLOs at lower percentiles (e.g., 99th, 99.9th).

This
paper describes SNC-Meister, a new admission control system for tail latency SLOs. SNC-Meister
improves upon the state-of-the-art DNC-based systems by using a new theory, Stochastic Network
Calculus (SNC), which is designed for tail latency percentiles. Focusing on tail latency percentiles,
rather than the adversarial worst-case DNC latency, allows SNC-Meister to pack together many
more tenants: in experiments with production traces, SNC-Meister supports 75% more tenants than
the state-of-the-art. We are the first to bring SNC to practice in a real computer system.

Paper: SNC-Meister: Admitting More Tenants with
Tail Latency SLOs, ACM SoCC 2016 (opens in new tab)
Code: SNC-Meister source code (opens in new tab) in C on github

NIDD: Non-Intrusive Delay Differentiation Scheduler [ITC’16]

Modern packet-switched communication networks use resource sharing via statistical multiplexing, which requires routers to buffer packets temporarily when they cannot be forwarded immediately. Packet buffering mitigates the risk for packet loss, but contributes to network latency. The Non-Intrusive Delay Differentiation (NIDD) project seeks to offer applications a choice of different delay guarantees without side effects.

Friendly Jamming on OTS Wi-Fi Access Points [TWC’16,WiSec’14]

Frequency jamming is the fiercest attack tool to disrupt wireless communication and its malicious aspects have received much attention in the literature. Yet, several recent works propose to turn the table and employ so-called friendly jamming for the benefit of a wireless network. For example, recently proposed friendly jamming applications include hiding communication channels, injection attack defense, and access control. This project investigates the practical viability of friendly jamming by applying it in a real-world network.

This project develops a friendly jamming application on customer grade access points (the cheap WRT54GL, skip text and download here) and conducted a three weeks real-world study on the jammer’s performance and side-effects on legitimate traffic (the cost of jamming) in a university office environment. Our results provide detailed insights on crucial factors governing the tradeoff between the effectiveness of friendly jamming (we evaluated up to 13 jammers) and its cost.

Paper: Friendly Jamming on Access Points: Analysis and Real-World Measurements, IEEE Transactions on Wireless Communication, 2016 (opens in new tab)
Paper: Gaining Insight on Friendly Jamming in a Real-world IEEE 802.11 Network, ACM Wireless Security, 2014 (opens in new tab)
Code: B43 assembly source code to be flashed on Wi-Fi device (opens in new tab). And corresponding Instructions (opens in new tab).
Data: Wireless measurements over several weeks in a production environment (opens in new tab) (partial results publicly available)

Adversarial Queueing Events Simulation Tool [SIGMETRICS’14]

Adversarial Queueing Theory (AQT) has shown that seemingly
innocent traffic injection rates might lead to unbounded
queues in packet-switched networks – depending on scheduling
strategies as well as topological characteristics. Little
attention has been given to quantifying these effects in realistic
network configurations. In particular, the existing AQT
literature makes two unrealistic assumptions: infinite buffers
and perfect synchrony. Because finite buffers inherently limit
queue sizes, adversarial effects ultimately lead to packet loss
which we address in this work. In addition, we study the
effect of imperfect network synchronization under the packet
loss metric. Our results, using analysis and simulation, indicate
that classical AQT examples appear harmless under
realistic assumptions but for a novel class of adversaries considerably
higher loss can be observed. We introduce this
class by giving examples of two new AQT concepts to construct
loss-efficient network adversaries. Our analysis proves
the robustness of these new adversaries against randomized
de-synchronization effects in terms of variable link delays
and nodal processing.

Paper: On the Relevance of Adversarial Queueing Theory
in Practice, ACM SIGMETRICS, 2014 (opens in new tab)
Code: Simulation source code (opens in new tab) in C++. And corresponding Description (opens in new tab).