Two New Algorithms for Phylogenetics and RNA structure
Speaker: Prof. Dan Gusfield, UC Davis
Date: Monday, August 6 2012
Time: 4:00PM to 5:30PM
Location: MIT Stata Center 32G-449 (Patil/Kiva) at 32 Vassar
Host: Manolis Kellis
Contact: derek aylward, 617 715 4882, derek.aylward@gmail.com
Relevant URL: In this talk, I will discuss the basic results in two current research
areas in my group. The first topic is on algorithms for phylogenetics
and their relation to chordal graph theory. The second topic is the
worst-case speed-up of algorithms for RNA secondary structure
prediction.
The Multi-State Perfect Phylogeny Problem is a natural extension of
the Binary Perfect Phylogeny (or Compatibility) Problem, allowing
evolutionary characters to take on more than two states. The basic
problem is to determine whether multi-state data can be derived on a
multi-state perfect phylogeny. In population genetics terminology,
the binary problem is to determine if data fits the "infinite-sites"
model, while the multi-state problem is to determine if data fits the
"infinite-alleles" model. The problem is of interest due to prior
elegant mathematical and algorithmic results, and due to non-binary,
but integer, data that is increasingly becoming available.
In 1975, Buneman showed a how to view the multi-state perfect
phylogeny problem as a problem of triangulating non-chordal graphs,
but that result has not been widely exploited as a computational tool,
despite a robust and continuing literature on the problem of
triangulating non-chordal graphs. In this talk, I discuss our recent
work on exploiting the chordal graph approach to study multi-state
perfect phylogeny problems.
The second part of the talk switches to RNA secondary structure
prediction. The problem of computationally predicting the secondary
structure (or folding) of RNA molecules was first introduced more than
thirty years ago and yet continues to be an area of active research
and development. The basic RNA-folding problem of finding a maximum
cardinality, non-crossing, matching of complimentary nucleotides in
an RNA sequence of length n, has an O(n^3)-time dynamic programming
solution that is widely applied. It was also known that an o(n^3)
worst-case time solution is possible, but the published and suggested
methods are complex and have not been established to be practical.
Recently, we give a simple, complete, and practical Four-Russians
algorithm for the basic RNA-folding problem, achieving a worst-case
time-bound of O(n^3/log(n)). This was more recently reduced to
O(n^3/log^2(n)), without using work-tricks. Implementations show that
the method achieves consistent speed-ups in practice. The
contribution is both theoretical and practical, since the basic
RNA-folding problem is often solved multiple times in the inner-loop
of more complex algorithms. We have extended the worst-case results to
other RNA folding and comparison problems.
See other events happening in August 2012