May 15

On storage of and in DNA

Tsachy Weissman
Stanford Univerrsity
Add to Calendar 2019-05-15 11:30:00 2019-05-15 13:00:00 America/New_York On storage of and in DNA Abstract: I'll first talk about activity in my group geared toward effective storage, communication, querying, and processing of the staggering amounts of DNA data being generated. Then I'll say something about our recent activity on DNA based storage systems, which have the potential to offer substantially higher storage densities and durability than current technologies. 32-G575

May 13

Add to Calendar 2019-05-13 11:30:00 2019-05-13 13:00:00 America/New_York ROOM CHANGE D463 STAR Explainable Artificial Intelligence in Precision Medicine Modern machine learning (ML) models can accurately predict patient progress and outcomes. However, they do not explain why selected features make sense or why a particular prediction was made. For example, a model may predict that a patient will get chronic kidney disease, which can lead to kidney failure. The lack of explanations about which features drove the prediction – e.g., high systolic blood pressure, high BMI, or others – hinders medical professionals in making diagnoses and decisions on appropriate clinical actions. I will briefly describe my group’s efforts to develop interpretable ML techniques for varied medical applications, including treating cancer based on a patient’s own molecular profile, identifying therapeutic targets for Alzheimer’s, predicting kidney diseases, preventing complications during surgery, enabling pre-hospital diagnoses for trauma patients, and improving our understanding of pan-cancer biology and genome biology. My talk will focus in greater detail on: MERGE, which uses ML to target treatment of acute myeloid leukemia, published in Nature Communications (Jan 2018); our explainable artificial intelligence system, Prescience, for preventing hypoxemia in patients under anesthesia, recently featured on the cover of Nature Biomedical Engineering (Oct 2018); and SHAP, our general ML framework on model interpretability, published as a full oral presentation at Neural Information Processing Systems (Dec 2017; cited 150).Short bioProf. Su-In Lee is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering and an Adjunct Associate Professor in the Departments of Genome Sciences, Electrical Engineering, and Biomedical Informatics and Medical Education at the University of Washington. She completed her PhD in 2009 at Stanford University with Prof. Daphne Koller (Stanford Artificial Intelligence Laboratory). Before joining the UW in 2010, Lee was a visiting professor in the Computational Biology Department at Carnegie Mellon University. She has received the National Science Foundation CAREER Award and been named an American Cancer Society Research Scholar. She has received numerous generous grants from the National Institutes of Health, the National Science Foundation, and the American Cancer Society. Lee is currently the PI for the following active grants: NIH/NIA R01, NIH/NLM R21, NIH/NIGMS R35, NSF/BIO INNOVATION, NSF/BIO CAREER, and ACS Research Scholar. 32-D463 Star

May 01

Add to Calendar 2019-05-01 11:30:00 2019-05-01 13:00:00 America/New_York Data structures to represent sets of k-mers The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying k-mer sets has emerged as a shared underlying component and there have been many specialized data structures for their representation. In this talk, I will describe the applications of k-mer sets in bioinformatics and motivate the need for specialized data structures. I will give an overview of known approaches and lower bounds, with a focus on unitig-based representations. Finally, I will describe a data structure for representing sets of k-mer sets, called the HowDe Sequence Bloom Tree.Bio:Paul Medvedev is an Associate Professor in the Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology and the Director of the Center for Computational Biology and Bioinformatics at the Pennsylvania State University. His research focus is on developing computer science techniques for analysis of biological data and on answering fundamental biological questions using such methods. Prior to joining Penn State in 2012, he was a postdoc at the University of California, San Diego and a visiting scholar at the Oregon Health & Sciences University and the University of Bielefeld. He received his Ph.D. from the University of Toronto in 2010, his M.Sc. from the University of Southern Denmark in 2004, and his B.S. from the University of California, Los Angeles in 2002. 32-G575

April 24

Add to Calendar 2019-04-24 11:30:00 2019-04-24 13:00:00 America/New_York Mutational signature analysis and its applications to the clinic Different mutational processes operative in cancer and other diseases leave distinct 'signatures' in the DNA. Mutational signature analysis is an attempt to deconvolve the mutational patterns from cancer sequencing data to better identify the factors that gave rise to cancer. Whereas previous work required a large amount of signal as found in exome and genome sequencing data, our new method SigMA enables accurate detection of mutational signatures even with >100-fold reduction in data size. This allows us to extend signature analysis to gene panels, the common platform used to profile tens of thousands of cancer patients each year. I will describe the methodology behind SigMA and how it can be used to identify patients with deficiency in the homologous recombination DNA repair pathway who should be considered for treatment with PARP inhibitors. This work was led by Dr. Doga Gulhan (PhD in heavy ion physics, MIT)Bio: Dr. Park is Professor of Biomedical Informatics at Harvard Medical School and the director of its Bioinformatics and Integrative Genomics Ph.D. program. His group (http://compbio.hms.harvard.edu) specializes in computational and statistical analysis of high-throughput sequencing data in epigenetics, cancer genetics, and neuroscience. He was originally trained in applied math (B.A., Harvard; Ph.D., Caltech), but he stumbled upon molecular biology and genetics during his postdoctoral studies. He has multiple positions open in his group for students, postdoctoral fellows, and scientific programmers. 32-G575

April 17

Interface Mimicry-Based Prediction of Host-Microbe Interactions

Ruth Nussinov
Professor in the Dept. of Human Genetics, School of Medicine, Tel Aviv University and Senior Principal Scientist, Principal Investigator at the National Cancer Institute, National Institutes of Health.
Add to Calendar 2019-04-17 11:30:00 2019-04-17 13:00:00 America/New_York Interface Mimicry-Based Prediction of Host-Microbe Interactions Signaling pathways shape and transmit the cell's reaction to its changing environment; however, microbes can circumvent this response by manipulating host signaling. To subvert host defense, they beat it at its own game: they hijack host pathways by mimicking the binding surfaces of host-encoded proteins. For this, it is not necessary to achieve global protein homology; imitating merely the interaction surface is enough. Different protein folds often interact via similar protein-protein interface architectures. This similarity in binding surfaces permits the pathogenic protein to compete with a host target protein. Thus, rather than binding a host-encoded partner, the host protein hub binds the microbe surrogate. The outcome can be dire: rewiring or repurposing the host pathways, shifting the cell signaling landscape and consequently the immune response. They can cause persistent infections as well as cancer by modulating key signaling pathways. Mapping the rewired host-pathogen 'superorganism' interaction network is critical for in-depth understanding of pathogenic mechanisms and developing efficient therapeutics. The talk will discuss the role of molecular mimicry in host evasion and describe a method, HMI-PRED, that we have developed to decipher it. Given the structure of the microbial protein it predicts structural models of potential host-microbe interaction complexes, the list of mimicked/disrupted host endogenous, tissue expression of the microbe-targeted host proteins, and the structural superorganism network.Emine Guven-Maiorov, Asma Hakouz, Ozlem Keskin, Attila Gursoy, Chung-Jung Tsai & Ruth NussinovCancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD 21702, U.S.ARuth Nussinov is a computational structural biologist at the NCI. Her Ph D thesis proposed the dynamic programming algorithm for the prediction of RNA secondary structure, which is still the primary method toward this aim. She was among the pioneers of DNA sequence analysis, proposed the fundamental concept of Conformational Selection and Population shift as an alternative to the textbook ‘Induced-Fit’ model in molecular recognition. Her studies unveiled the key role of allostery under normal conditions and in disease and the principles of allosteric drug discovery. She also proposed that proteins whose sequence and global structures differ may still share similar interface architectural motifs. This concept serves as a basis for the prediction of protein interactions. She was among the first to model amyloid conformations. During the last few years she has been focusing on signaling processes in cancer, mechanisms of activation of oncogenic proteins and implications to drug discovery. Dr. Nussinov received her Ph.D. in 1977 from Rutgers University and did post-doctoral work in the Structural Chemistry Department of the Weizmann Institute. Subsequently she was at the Chemistry Department at Berkeley, the Biochemistry Department at Harvard, and a visiting scientist at the NIH. In 1984 she joined the Medical School at Tel Aviv University. In 1985, she accepted a concurrent position at the National Cancer Institute of the NIH, Frederick national Laboratory for Cancer Research, where she is a Senior Principal Scientist and Principle Investigator heading the Computational Structural Biology Section at the NCI. She has authored over 600 scientific papers. She served as the Editor-in-Chief of PLOS Computational Biology and Associate Editor and the Editorial Boards of several journals. She is a frequent speaker in Domestic and International meetings, symposia and academic institutions, won several awards and is an elected Fellow of the Biophysical Society and the International Society for Computational Biology. She is a Highly Cited Researcher (ranking among the top 3000 researchers or 1% across all fields according to Thomson Reuters Essential Science Indicators, http://highlycited.com/ December 2015 and 2019), earning them the mark of exceptional impact. She also won an award from the AACR in 2017 for her paper on The Key Role of Calmodulin in KRAS-Driven Adenocarcinomas. Stata Center Building 32 Room G575

April 10

Add to Calendar 2019-04-10 11:30:00 2019-04-10 13:00:00 America/New_York Learning microbial dynamics for therapeutic applications: Scalable inference, robustness, and control. Microbes are everywhere, including in and on our bodies, and have been shown to play key roles in a variety of prevalent human diseases. Consequently, there has been intense interest in the design of bacteriotherapies or "bugs as drugs," which are communities of bacteria administered to patients for specific therapeutic applications. Central to the design of such therapeutics is an understanding of the causal microbial interaction network and the population dynamics of the organisms. Toward that direction I will present recent work on a Bayesian nonparametric model and associated efficient inference algorithm that addresses the key conceptual and practical challenges of learning microbial dynamics from time series microbe abundance data (Ref 1). These challenges include high-dimensional (300+ strains of bacteria in the gut) but temporally sparse and non-uniformly sampled data; high measurement noise; and, nonlinear and physically non-negative dynamics. In a related work I will discuss a simpler inference problem surrounding the engineering of an interdependent consortia of bacteria (Ref 2). Here we will focus on experimental design for inference and discuss best practices for designing synthetic bacterial consortia for clinical/pharmaceutical applications. If time is left, I will also discuss recent work on analyzing gradient descent where we provide provably stable accelerated algorithms for optimization when features are time varying (Ref 3). If we wish to deploy learning algorithms in a clinical setting, then we must have provable convergence guarantees and robustness properties similar to the guarantees we have for control algorithms.Bio: Travis just joined the faculty of Harvard Medical School in the Department of Pathology at Brigham and Women’s Hospital (Division of Computational Pathology). He did post-doctoral training in statistical inference and experimental biology with Dr. Georg Gerber and in the area of microbial dynamics/networks with Dr. Yang-Yu Liu. His PhD was in Control Theory with a minor in Mathematics (Analysis) from MIT. Outside of academic research Travis has worked for NASA, Boeing, and Johnson and Johnson. Travis’ algorithms are currently flying on air vehicles (unmanned and experimental aircraft, NOT the Boeing 737 Max 8). Stata Center Building 32 Room G575

April 03

Add to Calendar 2019-04-03 11:30:00 2019-04-03 13:00:00 America/New_York Lineage calling can identify antibiotic resistant clones within minutes Surveillance of circulating drug resistant bacteria is essential for healthcare providers to deliver effective empiric antibiotic therapy. However, the results of surveillance may not be available on a timescale that is optimal for guiding patient treatment. Here we present a method for inferring characteristics of an unknown bacterial sample by identifying the presence of sequence variation across the genome that is linked to a phenotype of interest, in this case drug resistance. We demonstrate an implementation of this principle using sequence k-mer content, matched to a database of known genomes. We show this technique can be applied to data from an Oxford Nanopore device in real time and is capable of identifying the presence of a known resistant strain in 5 minutes, even from a complex metagenomic sample. This flexible approach has wide application to pathogen surveillance and may be used to greatly accelerate diagnoses of resistant infections.Dr. Karel Brinda’s research lies at the intersection of computer science, applied mathematics, biology and epidemiology. He develops methods for rapid prediction of antibiotic resistance from sequencing data and for epidemiological surveillance. Dr. Brinda’s previous work focused on resource-frugal methods for sequence data analysis.Dr. Brinda is a Research Associate in the Harvard Chan School of Public Health and Harvard Medical School. He received PhD in computer science from Universite Paris-Est, France. Besides bioinformatics, Dr. Brinda also works on methods for automatic generation of tactile maps for blind users. 32-G575

March 20

Add to Calendar 2019-03-20 11:30:00 2019-03-20 13:00:00 America/New_York Genomic analysis pipeline: overview, challenges, and proposed solutions In this talk we will give an overview of the genomic analysis pipeline, from data generation to its analysis. In doing so, we will identify the main challenges arising in the genomic setting. These include dealing with errors introduced during the sequencing process, designing state-of-the-art specialized compressors to deal with the ever growing amount of genomic data being generated, as well as improving the accuracy of the current tools used for the analysis.We will emphasize on some of the effort being carried out by the international community to design a standard under the International Standardization Organization (ISO), denoted MPEG-G, for genomic information representation. We will also introduce a new filtering tool intended to improve the accuracy of variant calling, the last step of the genomic analysis pipeline whose output is generally the starting point for analysis in the personalized medicine paradigm. We will conclude the talk with some thoughts of where the community is going and the challenges that we will face in the near future.Dr. Idoia Ochoa is an assistant professor in the Electrical and Computer Engineering department at the University of Illinois at Urbana-Champaign (UIUC). Prior to that, Dr. Ochoa obtained a Ph.D. from the Electrical Engineering Department at Stanford University, in 2016. She received her M.Sc. from the same department in 2012. During her time at Stanford she conducted internships at Google and Genapsys, and served as a technical consultant for the HBO's TV show ``Silicon Valley''.Dr. Ochoa's main interests lie in the field of bioinformatics and computational genomics, and she uses a multidisciplinary approach that combines tools from information theory, signal processing, and machine learning, among others. Her main contributions include the design of several lossless and lossy compression schemes tailored to raw and aligned genomic data, as well as denoising schemes to reduce the noise present in such data. She has also developed compression schemes for other types of omics data, as well as schemes to perform similarity queries on compressed databases without the need of decompression. Finally, she has developed new methods for the discovery of gene networks specific to different cancer types.Dr. Ochoa is also part of the group of experts who is developing, under the International Standardization Organization (ISO), the new MPEG-G standard for genomic information representation. She is also part of the Center for Science of Information, an NSF Science and Technology Center, and she is the recipient of several US-based grants. 32-G575

March 06

Add to Calendar 2019-03-06 11:30:00 2019-03-06 13:00:00 America/New_York Bayesian machine learning models for understanding microbiome dynamics The human microbiome is highly dynamic on multiple timescales, changing dramatically during development of the gut in childhood, with diet, or due to medical interventions. I will present several Bayesian machine learning methods that we have developed for gaining insight into microbiome dynamics. The first, MC-TIMME (Microbial Counts Trajectories Infinite Mixture Model Engine), is a non-parametric Bayesian model for clustering microbiome time-series data that we have applied to gain insights into the temporal response of human and animal microbiota to antibiotics, infectious, and dietary perturbations. The second, MDSINE (Microbial Dynamical Systems INference Engine), is a method for efficiently inferring dynamical systems models from microbiome time-series data and predicting temporal behaviors of the microbiota, which we have applied to developing bacteriotherapies for C. difficile infection and inflammatory bowel disease. The third, Microbiome Interpretable Temporal Rule Engine (MITRE), is a method for predicting host status from microbiome time-series data, which achieves high accuracy while maintaining interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes.Dr. Gerber is a computer scientist, microbiologist and physician board certified in Clinical Pathology. He is an Assistant Professor of Pathology at Harvard Medical School and member of the Harvard-MIT Health Sciences and Technology faculty, Chief of the Division of Computational Pathology at the Brigham and Women’s Hospital (BWH), and Co-Director of the Massachusetts Host-Microbiome Center (MHMC) at BWH. His research lab builds novel computational models and experimental systems to understand the role of the microbiota in human diseases and applies these findings to develop new diagnostic tests and therapeutic interventions to improve patient care. His work has been funded by DARPA, NIH, the state of Massachusetts, private foundations, and corporate sponsorship.Dr. Gerber’s training includes a Fellowship in Infectious Disease Pathology and Molecular Microbiology at BWH, Residency in Clinical Pathology at BWH, MD from Harvard Medical School, Masters’ and PhD in Computer Science from MIT (supervised by David Gifford, Tommi Jaakkola and Rick Young), and Masters’ in Infectious Diseases and BA in Pure Mathematics from UC Berkeley. Prior to returning to graduate school, he founded several companies focused on developing and applying 3D graphics technologies to create feature and IMAX® films. Building 32, Room G575 (STATA)

February 27

February 20

Add to Calendar 2019-02-20 11:30:00 2019-02-20 13:00:00 America/New_York Hidden cancer immunology insights from tumor RNA-seq Abstract:I will introduce our work in mining and integrating large-scale tumor molecular profiles to inform cancer immunology and immunotherapy. Tumor RNA sequencing has become cost effective over the years, and I will discuss three algorithms that we developed to extract useful insights from treatment naïve RNA-seq samples in The Cancer Genome Atlas. First, TIMER can estimate immune cell components in tumors, and a webserver has been created for users to explore immune infiltration across TCGA tumors and make inference on user-provided samples. Second, TRUST can assemble T cell receptor (TCR) and B cell receptor (BCR) complementarity-determining regions (CDR3s) from unselected bulk tumor RNA-seq data. Third, TIDE derived tumor immune dysfunction and tumor immune exclusion gene expression signatures from pretreatment tumors to predict patient response to anti-PD1 and anti-CTLA4 treatment. Our work indicates that tumor RNA-seq, even on treatment naïve tumors, is cost effective to inform tumor microenvironment and tumor immunity. 32-G575

February 13

Add to Calendar 2019-02-13 11:30:00 2019-02-13 13:00:00 America/New_York Deciphering molecular mechanisms of disease consequent to mutation via semi-supervised learning Abstract: A major goal in computational biology is the development of algorithms, analysis techniques, and tools towards deep mechanistic understanding of life at a molecular level. In the process, computational biology must take advantage of the new developments in artificial intelligence and machine learning, and then extend beyond pattern analysis to provide testable hypotheses for experimental scientists. This talk will focus on our contributions to this process and relevant related work. We will first discuss the development of machine learning techniques for partially observable domains such as molecular biology; in particular, methods for accurate estimation of frequency of occurrence of hard-to-measure and rare events. We will show some identifiability results in parametric and nonparametric situations as well as how such frequencies can be used to correct estimated model accuracies. We will then show how these methods play key roles in inferring protein cellular roles and phenotypic effects of genomic mutations, with an emphasis on understanding the molecular mechanisms of human genetic disease. We further assessed the value of these methods in the wet lab where we tested the molecular mechanisms behind selected de novo mutations in a cohort of individuals with neurodevelopmental disorders. Finally, we will discuss implications on future research in machine learning, genome interpretation, and precision health.Predrag Radivojac is a Professor of Computer Science at Northeastern University, where he recently moved from Indiana University. Prof. Radivojac received his Bachelor's and Master's degrees in Electrical Engineering from the University of Novi Sad and University of Belgrade, Serbia. His Ph.D. degree is in Computer Science from Temple University (2003) under the direction of Prof. Zoran Obradovic and co-direction of Prof. Keith Dunker. In 2004 he held a post-doctoral position in Keith Dunker's lab at Indiana University School of Medicine, after which he joined Indiana University Bloomington. Prof. Radivojac's research is in the areas of computational biology and machine learning with specific interests in protein function, MS/MS proteomics, genome interpretation, and precision health. He received the National Science Foundation (NSF) CAREER Award in 2007 and is an August-Wilhelm Scheer Visiting Professor at Technical University of Munich (TUM) as well as an honorary member of the Institute for Advanced Study at TUM. At Indiana University, he was Associate Chair of the Department of Computer Science and a co-Director of all of Informatics and Data Science for the multi-campus Prediction Health Initiative. Prof. Radivojac's projects have been regularly supported by NSF and National Institutes of Health (NIH). He is currently an Editorial Board member for the journal Bioinformatics, Associate Editor for PLoS Computational Biology, and serves his third term (elected) on the Board of Directors of the International Society for Computational Biology (ISCB). Stata Center 32-G575