Abstract: Graph clustering is a fundamental problem in graph mining with important applications ranging from social networks analysis and entity resolution to computer vision and semi-supervised learning. In this talk I will discuss two novel variations of graph clustering. In the first part of the talk I will discuss the following clustering problem: there exist two latent clusters, and we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability less than 1/2. Can we recover the clusters with high probability and what is the minimum number of queries needed? I will present two state-of-the-art algorithms and a closely related application on predicting signs in online social networks. Our results improve recent results by Mazumdar and Saha [NIPS’16]. In the second part of the talk, I will discuss how to exploit motifs to uncover communities. Specifically, I will introduce the concept of motif-expanders that serves as the basis for motif-based clustering. I will present efficient algorithms and a well-performing heuristic that outperforms some of the most popular graph clustering tools. I will conclude with some experimental results that show the effectiveness of our methods to multiple applications in machine learning and graph mining.
Joint work with Michael Mitzenmacher (Harvard), Jakub Pachocki (OpenAI), Kasper Green Larsen (Aarhus), Jarosaw Basiok (Harvard), Ben Lawson (BU), Preetum Nakkiran (Harvard), Vasileios Nakos (Harvard)
Bio: Babis Tsourakakis is an assistant professor in computer science at Boston University and a research associate at Harvard. Tsourakakis obtained his PhD in Algorithms, Combinatorics and Optimization at Carnegie Mellon under the supervision of Alan Frieze, was a postdoctoral fellow at Brown University and Harvard under the supervision of Eli Upfal and Michael Mitzenmacher respectively. Before joining Boston University, he worked as a researcher in the Google Brain team. He won a best paper award in IEEE Data Mining, has delivered three tutorials in the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, and has designed two graph mining libraries for large-scale graph mining, one of which has been officially included in Windows Azure. His research focuses on large-scale graph mining, and machine learning.