Project

High-Performance Parallel Clustering

We are designing new parallel algorithms, optimizations, and frameworks for clustering large-scale graph and geometric data.

Clustering is an important machine learning and data mining technique to group together objects such that objects in the same group are more similar to each other than objects in different groups. There are numerous applications of clustering, including in Internet and social network analysis, geographic information systems, computer vision, natural language processing, and marketing.

The goal of this project is to design novel parallel algorithms and optimizations for clustering large-scale graph and geometric data. We intend to consider various classes of clustering algorithms, and use them to cluster large datasets in AI applications. We will also design high-level programming frameworks to make it easier to write new high-performance clustering algorithms, and also plan to develop a benchmark suite for comparing the performance of different algorithms as well as their clustering quality under different metrics.