[Thesis Defense] Efficient Systems for Large-Scale Graph Representation Learning

Speaker

Tianhao Huang

MIT CSAIL

Host

Srini Devadas

MIT CSAIL

Abstract:

Graph representation learning has gained significant traction in critical domains including finance, social networks, and more. Graph neural networks (GNNs), which integrate deep learning with graph structures, have emerged as the leading method, delivering superior performance across diverse graph related tasks. However, training GNNs on large-scale datasets encounters several scalability challenges on current system architectures, suffering from high input/output (I/O) contention and low compute utilization.

In this talk, I will present two complementary works that systematically address these challenges. The first work, Hanoi, unblocks the data loading bottleneck in out-of-core GNN training by co-designing sampling algorithms to align with the hierarchical memory organization of commodity hardware. Hanoi drastically reduces I/O traffic to external storage, achieving significant speedups over strong baselines with negligible impacts on the model quality. Building on this foundation, the second work, Joestar, introduces a unified framework for optimized GNN training on GPUs. It identifies novel operator fusion opportunities and formulates better execution schedules by jointly considering the sampling and compute stages. Combined with compiler infrastructure in PyTorch, Joestar is able to achieve state-of-the-art GNN training performance for billion-edge graph datasets on a single GPU.

Thesis committee: Srini Devadas, Julian Shun, Tushar Krishna.

Zoom link: https://mit.zoom.us/j/96057825507?pwd=CRV7oNc3GNzwFggfU5ksP35bbBKJoR.1

Add to Calendar 2025-04-02 10:00:00 2025-04-02 12:00:00 America/New_York [Thesis Defense] Efficient Systems for Large-Scale Graph Representation Learning Abstract:Graph representation learning has gained significant traction in critical domains including finance, social networks, and more. Graph neural networks (GNNs), which integrate deep learning with graph structures, have emerged as the leading method, delivering superior performance across diverse graph related tasks. However, training GNNs on large-scale datasets encounters several scalability challenges on current system architectures, suffering from high input/output (I/O) contention and low compute utilization. In this talk, I will present two complementary works that systematically address these challenges. The first work, Hanoi, unblocks the data loading bottleneck in out-of-core GNN training by co-designing sampling algorithms to align with the hierarchical memory organization of commodity hardware. Hanoi drastically reduces I/O traffic to external storage, achieving significant speedups over strong baselines with negligible impacts on the model quality. Building on this foundation, the second work, Joestar, introduces a unified framework for optimized GNN training on GPUs. It identifies novel operator fusion opportunities and formulates better execution schedules by jointly considering the sampling and compute stages. Combined with compiler infrastructure in PyTorch, Joestar is able to achieve state-of-the-art GNN training performance for billion-edge graph datasets on a single GPU. Thesis committee: Srini Devadas, Julian Shun, Tushar Krishna.Zoom link: https://mit.zoom.us/j/96057825507?pwd=CRV7oNc3GNzwFggfU5ksP35bbBKJoR.1 TBD

[Thesis Defense] Efficient Systems for Large-Scale Graph Representation Learning

Speaker

Host

April 02 2025

Location