Seq: a high-performance language for bioinformatics

Speaker

MIT

Host

Julian Shun

MIT CSAIL

Abstract: The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10^6-and the amount of data to be analyzed has increased proportionally, necessitating high-performance tools and methods in order to keep pace. Here we introduce Seq, a high-performance, Pythonic language for bioinformatics and computational genomics, which bridges the gap between the performance of low-level languages like C and C++, and the ease-of-use of high-level languages like Python. The Seq compiler employs numerous domain-specific optimizations to often attain even better performance than hand-optimized implementations of many important algorithms, which we discuss and evaluate.

Bio: I'm a graduate student at MIT CSAIL focusing on computational genomics, working with Prof. Bonnie Berger and Prof. Saman Amarasinghe. More specifically, my graduate research involves developing fast, accurate and easy-to-use algorithms and software for processing the ever-increasing genomic data that is being produced. I focus mainly on third-generation sequencing data, and applications pertaining to it like sequence alignment, assembly, genotyping and phasing.

Add to Calendar 2020-05-04 14:00:00 2020-05-04 15:00:00 America/New_York Seq: a high-performance language for bioinformatics Abstract: The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10^6-and the amount of data to be analyzed has increased proportionally, necessitating high-performance tools and methods in order to keep pace. Here we introduce Seq, a high-performance, Pythonic language for bioinformatics and computational genomics, which bridges the gap between the performance of low-level languages like C and C++, and the ease-of-use of high-level languages like Python. The Seq compiler employs numerous domain-specific optimizations to often attain even better performance than hand-optimized implementations of many important algorithms, which we discuss and evaluate.Bio: I'm a graduate student at MIT CSAIL focusing on computational genomics, working with Prof. Bonnie Berger and Prof. Saman Amarasinghe. More specifically, my graduate research involves developing fast, accurate and easy-to-use algorithms and software for processing the ever-increasing genomic data that is being produced. I focus mainly on third-generation sequencing data, and applications pertaining to it like sequence alignment, assembly, genotyping and phasing. Zoom https://mit.zoom.us/j/536883569 Password 008943

Organizer & Contact

Julian Shun

jshun@mit.edu

Part of

Fast Code Seminar 2019

Seq: a high-performance language for bioinformatics

Speaker

Host

May 04 2020

Location

Organizer & Contact

Part of

June 11

The Resurgence of Software Performance Engineering

July 09

The Sparse Tensor Algebra Compiler

Seq: a high-performance language for bioinformatics

Speaker

Host

May 04 2020

Location

Organizer & Contact

Part of

Related Events

June 11

The Resurgence of Software Performance Engineering

July 09

The Sparse Tensor Algebra Compiler