Detection of significant protein coevolution

被引:26
作者
Ochoa, David [1 ]
Juan, David [2 ]
Valencia, Alfonso [2 ]
Pazos, Florencio [1 ]
机构
[1] CSIC, Natl Ctr Biotechnol CNB, Computat Syst Biol Grp, Madrid 28049, Spain
[2] Spanish Natl Canc Res Ctr CNIO, Struct Bioinformat Grp, Madrid 28029, Spain
关键词
MULTIPLE SEQUENCE ALIGNMENT; PREDICTION; RESOURCE; DATABASE; TREES;
D O I
10.1093/bioinformatics/btv102
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The evolution of proteins cannot be fully understood without taking into account the coevolutionary linkages entangling them. From a practical point of view, coevolution between protein families has been used as a way of detecting protein interactions and functional relationships from genomic information. The most common approach to inferring protein coevolution involves the quantification of phylogenetic tree similarity using a family of methodologies termed mirrortree. In spite of their success, a fundamental problem of these approaches is the lack of an adequate statistical framework to assess the significance of a given coevolutionary score (tree similarity). As a consequence, a number of ad hoc filters and arbitrary thresholds are required in an attempt to obtain a final set of confident coevolutionary signals. Results: In this work, we developed a method for associating confidence estimators (P values) to the tree-similarity scores, using a null model specifically designed for the tree comparison problem. We show how this approach largely improves the quality and coverage (number of pairs that can be evaluated) of the detected coevolution in all the stages of the mirrortree workflow, independently of the starting genomic information. This not only leads to a better understanding of protein coevolution and its biological implications, but also to obtain a highly reliable and comprehensive network of predicted interactions, as well as information on the substructure of macromolecular complexes using only genomic information.
引用
收藏
页码:2166 / 2173
页数:8
相关论文
共 23 条
[1]  
Clark GW, 2011, METHODS MOL BIOL, V781, P237, DOI 10.1007/978-1-61779-276-2_11
[2]   Emerging methods in protein co-evolution [J].
de Juan, David ;
Pazos, Florencio ;
Valencia, Alfonso .
NATURE REVIEWS GENETICS, 2013, 14 (04) :249-261
[3]   Peroxiredoxins are conserved markers of circadian rhythms [J].
Edgar, Rachel S. ;
Green, Edward W. ;
Zhao, Yuwei ;
van Ooijen, Gerben ;
Olmedo, Maria ;
Qin, Ximing ;
Xu, Yao ;
Pan, Min ;
Valekunja, Utham K. ;
Feeney, Kevin A. ;
Maywood, Elizabeth S. ;
Hastings, Michael H. ;
Baliga, Nitin S. ;
Merrow, Martha ;
Millar, Andrew J. ;
Johnson, Carl H. ;
Kyriacou, Charalambos P. ;
O'Neill, John S. ;
Reddy, Akhilesh B. .
NATURE, 2012, 485 (7399) :459-U65
[4]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[5]   An introduction to ROC analysis [J].
Fawcett, Tom .
PATTERN RECOGNITION LETTERS, 2006, 27 (08) :861-874
[6]   MPIDB: the microbial protein interaction database [J].
Goll, Johannes ;
Rajagopala, Seesandra V. ;
Shiau, Shen C. ;
Wu, Hank ;
Lamb, Brian T. ;
Uetz, Peter .
BIOINFORMATICS, 2008, 24 (15) :1743-1744
[7]   The Gene Ontology (GO) database and informatics resource [J].
Harris, MA ;
Clark, J ;
Ireland, A ;
Lomax, J ;
Ashburner, M ;
Foulger, R ;
Eilbeck, K ;
Lewis, S ;
Marshall, B ;
Mungall, C ;
Richter, J ;
Rubin, GM ;
Blake, JA ;
Bult, C ;
Dolan, M ;
Drabkin, H ;
Eppig, JT ;
Hill, DP ;
Ni, L ;
Ringwald, M ;
Balakrishnan, R ;
Cherry, JM ;
Christie, KR ;
Costanzo, MC ;
Dwight, SS ;
Engel, S ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Nash, RS ;
Sethuraman, A ;
Theesfeld, CL ;
Botstein, D ;
Dolinski, K ;
Feierbach, B ;
Berardini, T ;
Mundodi, S ;
Rhee, SY ;
Apweiler, R ;
Barrell, D ;
Camon, E ;
Dimmer, E ;
Lee, V ;
Chisholm, R ;
Gaudet, P ;
Kibbe, W ;
Kishore, R ;
Schwarz, EM ;
Sternberg, P ;
Gwinn, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D258-D261
[8]   A Census of Human Soluble Protein Complexes [J].
Havugimana, Pierre C. ;
Hart, G. Traver ;
Nepusz, Tamas ;
Yang, Haixuan ;
Turinsky, Andrei L. ;
Li, Zhihua ;
Wang, Peggy I. ;
Boutz, Daniel R. ;
Fong, Vincent ;
Phanse, Sadhna ;
Babu, Mohan ;
Craig, Stephanie A. ;
Hu, Pingzhao ;
Wan, Cuihong ;
Vlasblom, James ;
Dar, Vaqaar-un-Nisa ;
Bezginov, Alexandr ;
Clark, Gregory W. ;
Wu, Gabriel C. ;
Wodak, Shoshana J. ;
Tillier, Elisabeth R. M. ;
Paccanaro, Alberto ;
Marcotte, Edward M. ;
Emili, Andrew .
CELL, 2012, 150 (05) :1068-1081
[9]   Selection of organisms for the co-evolution-based study of protein interactions [J].
Herman, Dorota ;
Ochoa, David ;
Juan, David ;
Lopez, Daniel ;
Valencia, Alfonso ;
Pazos, Florencio .
BMC BIOINFORMATICS, 2011, 12
[10]   PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments [J].
Jones, David T. ;
Buchan, Daniel W. A. ;
Cozzetto, Domenico ;
Pontil, Massimiliano .
BIOINFORMATICS, 2012, 28 (02) :184-190