Software-Defined Far Memory in Warehouse-Scale Computers

Speaker

Junwhan Ahn
Google

Host

Professor Daniel Sanchez
CSG - CSAIL - MIT
Increasing memory demand and slowdown in technology scaling pose important challenges to total cost of ownership (TCO) of warehouse-scale computers (WSCs). One promising idea to reduce the memory TCO is to add a cheaper, but slower, "far memory" tier and use it to store infrequently accessed (or cold) data. However, introducing a far memory tier brings new challenges around dynamically responding to workload diversity and churn, minimizing stranding of capacity, and addressing brownfield (legacy) deployments.

We present a novel software-defined approach to far memory that proactively compresses cold memory pages to effectively create a far memory tier in software. Our end-to-end system design encompasses new methods to define performance service-level objectives (SLOs), a mechanism to identify cold memory pages while meeting the SLO, and our implementation in the OS kernel and node agent. Additionally, we design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop. Our system has been successfully deployed across Google’s WSC since 2016, serving thousands of production services. Our software-defined far memory is significantly cheaper (67% or higher memory cost reduction) at relatively good access speeds (6us) and allows us to store a significant fraction of infrequently accessed data (on average, 20%), translating to significant TCO savings at warehouse scale.

This paper will be presented at ASPLOS 2019, Providence, RI, on April 15. YouTube video lightning talk for this paper is available at https://youtu.be/aKddds6jn1s.

*Bio*: Junwhan Ahn is a Senior Software Engineer in the Platforms team at Google. He received the Ph.D. degree in electrical engineering and computer science from Seoul National University in 2017. His past research focused on memory system design, emerging memory technologies, and processing in memory. His current research interests include optimizing memory/storage system design for datacenter workloads and using machine learning to optimize systems.

** Refreshments at 1:45 pm