John Peebles: Thesis Defense: Fast Spectral Primitives for Directed Graphs


John Peebles


Jon Kelner, Ronitt Rubinfeld and Virginia Williams

In recent years, there has been substantial progress in the field of algorithms achieved by exploiting linear algebraic and optimization perspectives, often combined with much more traditional algorithmic tools such as randomization and recursion. We will discuss several problems that benefited from combinations of these approaches as well as other problems of interest. More concretely:

We give an overview of the first nearly linear time algorithms for a large class of directed graph problems including computing the stationary distribution of a Markov chain with only a logarithmic dependence on the mixing time. Our approach is based on developing new spectral tools for directed graphs, including the first algorithms for sparsifying directed graphs and solving directed Laplacian linear systems.
Symmetric diagonally dominant matrices frequently arise in science and engineering applications, often when one is discretization certain types of differential equations. We give an overview of faster algorithms for estimating the determinant of a symmetric diagonally dominant matrix and for sampling random spanning trees from a graph.Generative adversarial networks (GANs) are an important technique used in deep learning. However, the methods for training them are not satisfactorily understood from a theoretical perspective. To help improve this understanding, we devise a parametric problem which is sophisticated enough to capture many of the main difficulties associated with GAN training, yet simple enough to analyze rigorously.
Fibonacci heaps are a data structure that provide a heap (a priority queue) and have optimal runtimes (in an amortized sense) for these operations. They are especially useful when one has to change the priorities of element many more times than one needs to remove elements from the data structure. We give an overview of how to resolve two conjectures regarding the efficiency of variants of Fibonacci heaps, the first due to Karger and the second due to Fredman.
Perhaps the most basic statistical question one can ask is how many samples of data one needs in order to check whether the data came from a hypothesis distribution or not. This is sometimes called goodness of fit testing in statistics or identity testing in computer science. We give an overview of the first algorithms that provably use as few samples as possible for these problems in all parameter regimes.