Simultaneous Clustering of Multiple Gene Expression and Physical Interaction Datasets

被引:28
作者
Narayanan, Manikandan [1 ]
Vetta, Adrian [2 ,3 ]
Schadt, Eric E. [1 ]
Zhu, Jun [1 ]
机构
[1] Rosetta Inpharmat Merck, Dept Genet, Seattle, WA USA
[2] McGill Univ, Dept Math & Stat, Montreal, PQ, Canada
[3] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
关键词
SACCHAROMYCES-CEREVISIAE; TRANSCRIPTIONAL CONTROL; NETWORK; DISCOVERY; RECONSTRUCTION; PREDICTION; PATHWAYS; ONTOLOGY;
D O I
10.1371/journal.pcbi.1000742
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.
引用
收藏
页数:13
相关论文
共 46 条
[1]   Reconstructing the pathways of a cellular system from genome-scale signals by using matrix and tensor computations [J].
Alter, O ;
Golub, GH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (49) :17559-17564
[2]  
Andersen R, 2008, PROCEEDINGS OF THE NINETEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P651
[3]  
[Anonymous], 2005, P 37 ANN ACM S THEOR
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]   Similarities and differences in genome-wide expression data of six organisms [J].
Bergmann, S ;
Ihmels, J ;
Barkai, N .
PLOS BIOLOGY, 2004, 2 (01) :85-93
[6]   A predictive model for transcriptional control of physiology in a free living cell [J].
Bonneau, Richard ;
Facciotti, Marc T. ;
Reiss, David J. ;
Schmid, Amy K. ;
Pan, Min ;
Kaur, Amardeep ;
Thorsson, Vesteinn ;
Shannon, Paul ;
Johnson, Michael H. ;
Bare, J. Christopher ;
Longabaugh, William ;
Vuthoori, Madhavi ;
Whitehead, Kenia ;
Madar, Aviv ;
Suzuki, Lena ;
Mori, Tetsuya ;
Chang, Dong-Eun ;
DiRuggiero, Jocelyne ;
Johnson, Carl H. ;
Hood, Leroy ;
Baliga, Nitin S. .
CELL, 2007, 131 (07) :1354-1365
[7]   On modularity clustering [J].
Brandes, Ulrik ;
Delling, Daniel ;
Gaertler, Marco ;
Goerke, Robert ;
Hoefer, Martin ;
Nikoloski, Zoran ;
Wagner, Dorothea .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (02) :172-188
[8]  
Cheng D., 2005, ACM SIGMOD SIGACT SI, P196
[9]   Identifying transcription factor functions and targets by phenotypic activation [J].
Chua, Gordon ;
Morris, Quaid D. ;
Sopko, Richelle ;
Robinson, Mark D. ;
Ryan, Owen ;
Chan, Esther T. ;
Frey, Brendan J. ;
Andrews, Brenda J. ;
Boone, Charles ;
Hughes, Timothy R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (32) :12045-12050
[10]  
Davidson EH., 2006, REG GEN GEN REG