Thesis Defense: Programmable Architectural Support for Diverse Sparse Workloads - Ryan Lee
Defense Title: Programmable Architectural Support for Diverse Sparse Workloads
Sparsity is abundant in many workload domains, but presents challenges that results in under- utilization of the available resources in existing hardware. Sparse workloads exhibit irregular control-flow and long-latency memory accesses, starving the core of useful work, and perform fine-grained accesses leading to inefficient use of the available memory bandwidth.
Prior work has proposed several software and hardware mechanisms to accelerate sparse workloads, but there has been a lack of a general technique that is applicable to the diverse set of applications in this domain. In particular, existing solutions have had limited support for workloads that concurrently read and update the underlying sparse data structure, such as dynamic graph applications and databases. Prior proposals have instead limited various dimensions of the applications they target in this space, such as restricting the formats they support (e.g., only hash tables) or constraining the types of concurrent operations (e.g., read- only), thereby limiting their applicability. In addition, prior work has insufficiently addressed the inefficient data transfer between compute and memory, instead opting to put expensive compute elements near memory or only supporting restricted forms of fine-grained accesses.
This thesis shows that it is possible to design a general and programmable architecture that supports a wide range of sparse workloads. To this end, this thesis presents two hardware accelerators. First, Terminus adds a small hardware unit near each core that accelerate a wide range of data structures types and concurrent reads and updates to these structures, achieving a gmean of 7.4× speedup over a CPU baseline. Second, Gist enhances each DRAM chip with a flexible hardware unit that autonomously performs fine-grained scatter/gather operations for sparse workloads. This allows Gist to more efficiently use the memory bus by returning a compact stream of data, and achieves a gmean of 1.6× speedup over state-of-the-art support for sparse workloads.
https://mit.zoom.us/j/8203717891
Advisor: Professor Daniel Sanchez