New High-Res Map of Human Genome Unveiled

Four new papers co-authored by researchers from Associate Professor Manolis Kellis' Computational Biology group at CSAIL unveil a new high-resolution picture of the human genome that should prove useful in better understanding human biology and disease. By carefully examining and comparing the genomes of 29 different mammals, Kellis, a principal investigator at CSAIL, and collaborators around the world have gained a better understanding of the evolution of the human genome by being able to see which aspects have been preserved over time, a key step in understanding human biology and disease.
 
Findings of the initial study, “A high-resolution map of human evolutionary constraint using 29 mammals” appeared in the Oct. 12 online edition of the journal Nature.
 
Starting in 2005, the Broad Institute, the Genome Institute at Washington University, and the Baylor College of Medicine Human Genome Sequencing Center sequenced the genomes of 29 placental mammals, including the bat, chimpanzee, dolphin and elephant genomes.
 
In collaboration with Professor Kellis’ Computational Biology group, they developed computational methods to compare the 29 genomes to get a systematic view of the evolution of the human genome. The scientists searched for specific regions of the genome that have been preserved over evolutionary time to reveal likely functional elements, and studied their specific patterns of conservation, known as "evolutionary signatures," that can give insights into the specific functions of these elements. This can help distinguish which elements result in proteins or RNA structures, and which encode regulatory elements that instead control gene expression.
 
“The 29-mammals comparison resulted in a global map of conserved functional elements in the human genome at the resolution of individual binding sites,” said Kellis, leader of the Computational Biology Group at CSAIL. “This allows us to now focus on any one region of the genome, and especially the vast unannotated intronic and intergenic regions that do not encode proteins, and immediately shine the spotlight on those elements that evolution has minutely preserved .”
 
This map can provide key information on deciphering and understanding disease, as it helps focus attention on the five percent of the genome that is evolutionarily conserved, and thus much more likely to contain the elements necessary for proper gene function and gene regulation.
 
“Comparative genomics gives us a unique perspective on the human genome, one that is independent of the cell type or tissue,” said Manuel Garber, a researcher with CSAIL and the Broad Institute who contributed to the paper. “As we show in this and other works, the unprecedented depth of this dataset can be used as a powerful complement to other cell type or tissue specific datasets.”
 
 “For the first time, we can actually recognize nearly all conserved elements at high resolution, regardless of the specific cell types or conditions they act in,” said Kellis. “This allows us to revisit disease-associated regions, and prioritize functional study of those mutations that disrupt conserved regions.”
 
While previous studies had correctly estimated that only five percent of the genome is under selection, researchers lacked sufficient power to pinpoint the conserved elements until this study. The key advance in this study is the new resolution of about 10 nucleotides, corresponding roughly to the length of sequences recognized by transcriptional regulators. The reported elements cover about 4.2 percent of the human genome, or more than 80 percent of the estimated fraction of the genome under selection, a great advance towards the complete annotation of the human genome.
 
The map contains 2.7 million conserved instances of regulatory motifs that are directly responsible for binding of transcription factors to DNA, and form the building blocks of regulatory grammars that control development, differentiation, and cellular response to external stimuli.
 
“Because of their short nature, regulatory motif instance identification was one of the genomic features that most dramatically benefited from the additional species and will continue to benefit from additional sequencing,” said Pouya Kheradpour, a graduate student at CSAIL studying under Kellis. “The high number of species, size of the genomes, and sequencing strategy posed a number of challenges for us that we overcame by improving our algorithmic techniques.”
 
Through the comparative analysis of 29 different mammalian genomes, researchers also uncovered several surprising biological facts about the human genome, the details of which are also being released in three additional papers that appeared online in Genome Research, also on Oct. 12.
 
Researchers found that thousands of regions within human protein-coding genes also encode additional functional elements, described in a paper in Genome Research first-authored by Michael F. Lin, a graduate student in the MIT Computational Biology group at CSAIL.
 
“This is akin to discovering that an old letter that we had read hundreds of times in fact contains additional hidden messages written between the lines,” said Kellis. “These play diverse roles in gene expression, splicing, translation, and degradation, and are involved in both pre- and post-transcriptional gene control, providing a much richer view of gene regulation.”
 
The researchers also discovered thousands of new RNA structures that fall in hundreds of new RNA families, providing the opportunity for systematic study of additional types of gene regulation at the RNA level.
 
Additionally, researchers discovered previously unrecorded evidence that some genes don’t stop protein translation when they hit a stop codon, the gene equivalent of a red light. Instead they continue translation until a second or third stop codon, resulting in additional domains that may change the function of the corresponding protein, a finding that could prove useful in further understanding the underlying workings of gene regulation at the translational level.
 
A better understanding of the human genome is a pre-requisite to better understanding human disease. The results of this study, according to Kellis, deliver on the original goal of the human genome project by providing a detailed picture of the genome for further research.
 
“The human genome project had the ultimate goal of enabling disease studies and enabling a deeper understanding of human biology,” said Kellis. “This project brings us one step closer to that original goal, and the resulting annotations and biological insights can serve as a pre-requisite for countless disease studies going forward.”
 
This project was supported by the National Human Genome Research Institute, National Institute for General Medicine, the European Science Foundation, National Science Foundation, the Sloan Foundation, an Erwin Schrödinger Fellowship, the Gates Cambridge Trust, Novo Nordisk Foundation, University of Copenhagen, the David and Lucile Packard Foundation, the Danish Council for Independent Research Medical Sciences, and The Lundbeck Foundation.

Abby Abazorius, CSAIL