Parallel Binary Trees Using Just Join
Host
Julian Shun
MIT CSAIL
Abstract:
Tree structures play an important role in almost every area in computer science. Nowadays the explosion of data puts forward a higher demand for the performance of the trees, which requires parallelism, concurrency, work-efficiency, and full-featured interface.
Our work proposes a parallel tree structure, which bases a full-featured interface on a single operation Join. The join operation captures all balancing criteria, enables parallelism and work-efficiency, deals with a variety of augmentations, and implements persistence. As a result, our tree structure is highly-parallelized, safe for concurrency, theoretically work-efficient, supporting a wide range of functions and augmentations, multi-versioned and allowing efficient GC. Because of these good properties, our tree structure is applicable to many applications, such as many geometric algorithms, inverted index searching, HTAP (Hybrid Transactional and Analytical Processing) database systems with snapshot isolation, multi-version concurrency control for transactional memories, etc.
We implemented the tree structure in a C++ library PAM, based on which we implemented all the mentioned applications. PAM’s interface greatly simplifies programming over direct implementations. Using PAM, each application mentioned above only needs about 100 lines of code.
Besides simplicity, experiments also show that our implementations are practically efficient. Our code achieves performance that matches or exceeds existing libraries designed specially for a single application, both sequentially and in parallel. The parallel speedup ranges from 40 to 100 on 72 cores with 2-way hyperthreading.
Bio:
Yihan Sun is currently a Ph.D. student in Computer Science Department at Carnegie Mellon University, advised by Guy Blelloch. Prior to that, she received her Bachelor’s degree in Computer Science from Tsinghua University, working on data mining and social network analysis. Her research interests broadly lie in the theory and practice of parallel algorithms, data structures, as well as their implementations and applications.
Tree structures play an important role in almost every area in computer science. Nowadays the explosion of data puts forward a higher demand for the performance of the trees, which requires parallelism, concurrency, work-efficiency, and full-featured interface.
Our work proposes a parallel tree structure, which bases a full-featured interface on a single operation Join. The join operation captures all balancing criteria, enables parallelism and work-efficiency, deals with a variety of augmentations, and implements persistence. As a result, our tree structure is highly-parallelized, safe for concurrency, theoretically work-efficient, supporting a wide range of functions and augmentations, multi-versioned and allowing efficient GC. Because of these good properties, our tree structure is applicable to many applications, such as many geometric algorithms, inverted index searching, HTAP (Hybrid Transactional and Analytical Processing) database systems with snapshot isolation, multi-version concurrency control for transactional memories, etc.
We implemented the tree structure in a C++ library PAM, based on which we implemented all the mentioned applications. PAM’s interface greatly simplifies programming over direct implementations. Using PAM, each application mentioned above only needs about 100 lines of code.
Besides simplicity, experiments also show that our implementations are practically efficient. Our code achieves performance that matches or exceeds existing libraries designed specially for a single application, both sequentially and in parallel. The parallel speedup ranges from 40 to 100 on 72 cores with 2-way hyperthreading.
Bio:
Yihan Sun is currently a Ph.D. student in Computer Science Department at Carnegie Mellon University, advised by Guy Blelloch. Prior to that, she received her Bachelor’s degree in Computer Science from Tsinghua University, working on data mining and social network analysis. Her research interests broadly lie in the theory and practice of parallel algorithms, data structures, as well as their implementations and applications.