RobinHood: Tail Latency-Aware Caching

Speaker

CMU

Host

Frans Kaashoek
ABSTRACT
Tail latency is of great importance in user-facing web services. However, achieving low tail latency is challenging, because typical user requests result in multiple queries to a variety of complex backends (databases, recommender systems, ad systems, etc.), where the request is not complete until all of its queries have completed.

In this talk we present our findings for the case of several large web services at Microsoft. We find that backend query latencies vary by more than two orders of magnitude across backends and vary widely over time. As user requests have to wait for the slowest query, this variability causes high request tail latencies.

We proposes a novel solution for maintaining low request tail latency: repurpose existing caches to mitigate the effects of backend latency variability. Our solution, RobinHood, dynamically reallocates cache resources from backends that don't affect request latency to those that do -- effectively balancing load across heterogenous backend services. While common intuition says that "caching does not address tail latency", our evaluation shows that RobinHood is very effective. On a 50-server cluster and in the presence of load spikes, RobinHood meets a 150ms SLO 99.7% of the time, whereas the next best caching system only meets this SLO 70% of the time.

Joint work with Benjamin Berg (CMU), Timothy Zhu (Penn State), Mor Harchol-Balter (CMU), and Siddhartha Sen (MSR). Will appear at USENIX OSDI 2018.

SPEAKER BIO
Daniel S. Berger is the 2018 Mark Stehlik Postdoctoral Fellow in the Computer Science Department at Carnegie Mellon University. His research interests intersect systems, mathematical modeling, and performance testing. Daniel’s research explores how caching can be used to reduce tail latency in large web services and CDNs. Daniel has received his Ph.D (2018) from the University of Kaiserslautern, Germany, and has spent extended visits at CMU (2015-2017), Warwick University (2014), T-Labs Berlin (2013), ETH Zurich (2012), and at the University of Waterloo (2011). Previously, Daniel worked as a data scientist at the German Cancer Research Center (2008-2010) and as a project scientist at CMU (2017-2018).