Fast and Accurate Co-Estimation of Large-scale Phylogenetic Alignments and Trees
Speaker: Kevin Liu , Department of Computer Science, The University of Texas at AustinContact:
Date: April 13 2011
Time: 9:00AM to 10:00AM
Location: 32-G449, Kiva
Host: Manolis Kellis, MIT CSAIL
Teresa Cataldo, firstname.lastname@example.org
Phylogenetic trees and multiple sequence alignments play important roles in a wide range of biological research, including the reconstruction of the Tree of Life -- the evolutionary history of all organisms on Earth -- and the development of vaccines and antibiotics. The importance of phylogenetic trees and alignments drives interdisciplinary research to create methods to reconstruct them accurately and efficiently.
Traditionally, phylogenetic studies proceed in two phases: first, a multiple sequence alignment is produced from biomolecular sequences collected from different groups of organisms, and, second, a tree is produced using the alignment to describe the evolutionary relationships among the groups of organisms. Two-phase methods return reasonably accurate alignments and trees on datasets with at most 200 or so sequences and low sequence divergence. However, the alignment and topological accuracy of these methods degrades as datasets grow larger and/or sequences become more divergent. Alternatively, methods have been developed to simultaneously estimate phylogenetic alignments and trees. These methods are either too computationally intensive to analyze datasets with more than a few hundred sequences or no more accurate than two-phase methods.
Today's phylogenetic studies include a greater number and variety of sequenced organisms than ever before, especially due to exponential growth in affordable sequencing and computing power. Thus, a primary challenge is efficient and accurate estimation of large-scale alignments and trees. To address this challenge, I have developed SATe.
SATe is the first fast and accurate method for simultaneous estimation of alignments and trees on datasets with up to several thousand nucleotide sequences. Using an empirical study on biological and synthetic datasets, I show that SATe improves upon the alignment and topological accuracy of all existing methods while retaining reasonable computational requirements.
See other events that are part of
See other events happening in April 2011