Speaker: Anshul Kundaje
Massachusetts Institute of Technology
Title: "Unraveling the functional heterogeneity and diversity of regulatory elements in the human genome"
In 2003, the Human Genome Project marked a major scientific milestone by releasing the first consensus DNA sequence of a complete human genome. The ENCODE (Encyclopedia of DNA elements) project was launched to pick up where the Human Genome Project left off, with the goal of systematically deciphering the function of every base (letter) in the genome. Over the last 4 years, ENCODE has generated extensive genome-wide functional genomic data measuring the cellular activity of thousands of cellular moieties in a variety of normal and diseased cellular contexts. In this talk, I will describe novel computational and machine learning approaches that I developed for integrative analysis of ENCODE data in order to unravel the functional heterogeneity of regulatory elements in the human genome and their implications in human disease. I will begin with a gentle introduction to the diversity and scale of ENCODE data and a brief overview of robust, statistical methods that I developed for automated detection of DNA binding sites of regulatory proteins from massive collections of noisy, experimental data. Regulatory proteins can perform multiple functions by interacting with and co-binding DNA with different combinations of other regulatory proteins. I developed a novel discriminative machine learning formulation based on regularized rule-based ensembles that was able to sort through the combinatorial complexity of possible regulatory interactions and learn statistically significant item-sets of co-binding events at an unprecedented level of detail. I discovered a large number of novel pairwise and higher-order interactions, several of which were experimentally validated. I found extensive evidence that regulatory proteins could switch co-binding partners at different sets of regulatory domains within a single cell-type and across different cell-types thereby affecting patterns of other chemical modifications to DNA and regulating different functional categories of target genes. Finally, I will present a novel approach that exploits ENCODE data to significantly improve interpretation of human disease studies. Massive case-control studies involving comparative analysis of DNA sequences of diseased and healthy individuals have been reasonably successful at identifying genomic variants (mutations) associated with various human diseases. However, understanding the functional impact of these mutations has been very challenging. Using functional elements discovered from ENCODE data, we were able to identify and prioritize functional variants, provide a functional annotation for up to 81% of all publicly available disease-associated variants and generate new hypotheses by integrating multiple sources of data. Together, these efforts take us one step closer to learning unified models of regulatory mechanisms in humans and improve our system-level understanding of cellular processes and complex diseases.
Anshul Kundaje is a Research Scientist in the Computer Science Department at the Massachusetts Institute of Technology and the Broad Institute of MIT and Harvard. His primary research interest is large-scale computational genomics. He specializes in developing statistical and machine learning methods for massive integrative analysis of heterogeneous, high-throughput functional genomics data to learn models of gene regulation and improve interpretation of disease studies. He completed his PhD in Computer Science at Columbia University (2003-2008), under the guidance of Dr. Christina Leslie where he developed novel machine learning methods to learn models of gene regulation in yeast. He conducted postdoctoral research (2008-2012) in the Computer Science department at Stanford University, mentored by Profs. Serafim Batzoglou and Arend Sidow. He served as the lead data-coordinator and primary integrative analyst with the human ENCODE (Encyclopedia of DNA Elements) Project where he developed novel computational approaches for large-scale, integrative analysis of diverse functional genomics data to decipher the complexity of the largest collection of functional elements in the human genome. He was the second author (first non-PI author) amongst 590 co-authors on the flagship ENCODE paper published in Nature in 2012. He also published 9 other high-impact companion papers in the prestigious journals Nature, Genome Research and Genome Biology in 2012. Currently at MIT, he works with Prof. Manolis Kellis's group as one of the primary computational analysts of the NIH Roadmap Epigenomics Project to understand the variation of functional genomic signals across cell types, individuals and organisms. He is now keen to start his own research program and collaborate extensively with computational and experimental groups to explore novel, large-scale machine learning approaches to improve our system-level understanding of cellular regulatory mechanisms, human health and disease.
Joint talk with Donnelly Centre for Cellular and BioMolecular
Research and the Department of Computer Science