Redesigning the Memory Hierarchy to Exploit Static and Dynamic Application Information


Po-An Tsai, PhD Candidate


Professor Daniel Sanchez
Memory hierarchies are crucial to performance, but current systems adopt rigid, hardware-managed cache hierarchies that cause much more data movement than needed. The root cause for the inefficiencies of cache hierarchies is that they adopt and inexpressive, legacy interface and **ignore most application information**. Specifically, they are structured as a rigid hierarchy of progressively larger and slower cache levels, with sizes and policies fixed at design time. Caches expose a flat address space to programs that hides the hierarchy’s structure, and transparently move data across cache levels in fixed-size blocks using simple, fixed heuristics. Besides squandering valuable application-level information, this design is very costly: providing the illusion of a flat address space requires sophisticated address translation machinery, including associative lookups in caches and virtual memory.

This thesis proposes to redesign the memory hierarchy to better exploit application information. We take a cross-layer approach that redesigns the hardware-software interface to put software in control of the hierarchy and naturally convey application semantics. We focus on two main directions:

First, we design reconfigurable cache hierarchies that exploit **dynamic application information** to optimize their structure on the fly, approaching the performance of the best application-specific hierarchy. Hardware monitors application memory behavior at low overhead, and a software runtime uses this information to periodically reconfigure the system. This approach enables software to (i) build single- or multi-level virtual cache hierarchies tailored to the needs of each application, making effective use of spatially distributed and heterogeneous (e.g., SRAM and stacked DRAM) cache banks; (ii) replicate shared data near-optimally to minimize on-chip and off-chip traffic; and (iii) schedule computation across systems with heterogeneous hierarchies (e.g., systems with near-data processors).

Second, we redesign the memory hierarchy to exploit **static application information** by managing variable-sized objects, the natural unit of data access in programs, instead of fixed-size cache lines. This work will be the focus of the talk. We present the Hotpads object-based hierarchy, which leverages object semantics to hide the memory layout and dispense with the flat address space interface. Similarly to how memory-safe languages abstract the memory layout, Hotpads exposes an interface based on object pointers that disallows arbitrary address arithmetic. This avoids the need for associative caches. Instead, Hotpads moves objects across a hierarchy of directly addressed memories. It rewrites pointers to avoid most associative lookups, provides hardware support for memory allocation, and unifies hierarchical garbage collection and data placement. Hotpads also enables many new optimizations. For instance, we have designed Zippads, a memory hierarchy that leverages Hotpads to compress objects. Leveraging object semantics and the ability to rewrite pointers in Hotpads, Zippads compresses and stores objects more compactly, with a novel compression algorithm that exploits redundancy across objects. Though object-based languages are often seen as sacrificing performance for productivity, this work shows that hardware can exploit this abstraction to improve performance: Hotpads reduces dynamic memory hierarchy energy by 2.6× and improves performance by 34%; and Zippads reduces main memory footprint by 2× while improving performance by 30%.