Seq: a high-performance language for bioinformatics

Speaker

MIT

Host

Julian Shun
MIT CSAIL
Abstract: The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10^6-and the amount of data to be analyzed has increased proportionally, necessitating high-performance tools and methods in order to keep pace. Here we introduce Seq, a high-performance, Pythonic language for bioinformatics and computational genomics, which bridges the gap between the performance of low-level languages like C and C++, and the ease-of-use of high-level languages like Python. The Seq compiler employs numerous domain-specific optimizations to often attain even better performance than hand-optimized implementations of many important algorithms, which we discuss and evaluate.

Bio: I'm a graduate student at MIT CSAIL focusing on computational genomics, working with Prof. Bonnie Berger and Prof. Saman Amarasinghe. More specifically, my graduate research involves developing fast, accurate and easy-to-use algorithms and software for processing the ever-increasing genomic data that is being produced. I focus mainly on third-generation sequencing data, and applications pertaining to it like sequence alignment, assembly, genotyping and phasing.