This project aims to develop an improved alignment algorithm for barcoded short reads as produced by 10X Genomics' sequencing platform.

To combat the shortcomings of standard short-read sequencing, several so-called "third-generation" sequencing technologies have emerged in recent years, including those of Pacific Biosciences, Oxford Nanopore and, indeed, 10X Genomics. Each of these new platforms gives rise to novel data types that require revised and reinvented algorithms. Here we focus on the barcoded reads produced by 10X Genomics' sequencing platforms, and investigate how they can be most effectively aligned, both in terms of speed and accuracy. While the current standard barcoded alignment method involves a combination of best-mapping and all-mapping followed by iteratively assigning reads to one of their possible mappings via a Markov random field, we explore leveraging a fast and sensitive all-mapper in conjunction with a variant of expectation maximization for obtaining the optimal alignments.

Research Areas