A multi-scale coevolutionary approach to predict interactions between protein domains

被引:19
作者
Croce, Giancarlo [1 ]
Gueudre, Thomas [2 ]
Cuevas, Maria Virginia Ruiz [1 ]
Keidel, Victoria [3 ]
Figliuzzi, Matteo [1 ]
Szurmant, Hendrik [3 ]
Weigt, Martin [1 ]
机构
[1] Sorbonne Univ, Inst Biol Paris Seine, Biol Computat & Quantitat LCQB, CNRS, Paris, France
[2] Italian Inst Genom Med, Turin, Italy
[3] Western Univ Hlth Sci, Coll Osteopath Med Pacific, Dept Basic Med Sci, Pomona, CA USA
基金
欧盟地平线“2020”;
关键词
COMPARATIVE GENOME ANALYSIS; ESCHERICHIA-COLI; STRUCTURAL BASIS; RESIDUE; IDENTIFICATION; DATABASE; CONTACTS; FAMILY; INFORMATION; INTERFACES;
D O I
10.1371/journal.pcbi.1006891
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions. Author summary Interactions between proteins and their domains are at the basis of almost all biological processes. To complement labor intensive and error-prone experimental approaches to the genome-scale characterization of such interactions, we propose a computational approach based upon rapidly growing protein-sequence databases. To maintain interaction in the course of evolution, proteins and their domains are required to coevolve: evolutionary changes in the interaction partners appear correlated across several scales, from correlated presence-absence patterns of proteins across species, up to correlations in the amino-acid usage. Our approach combines these different scales within a common mathematical-statistical inference framework, which is inspired by the so-called direct coupling analysis. It is able to predict currently unknown, but biologically sensible interaction, and to identify cases of convergent evolution leading to alternative solutions for a common biological task. Thereby our work illustrates the potential of global statistical inference for the genome-scale coevolutionary analysis of interacting proteins and protein domains.
引用
收藏
页数:21
相关论文
共 61 条
[1]   The X-ray structure of the type II secretion system complex formed by the N-terminal domain of EpsE and the cytoplasmic domain of EpsL of Vibrio cholerae [J].
Abendroth, J ;
Murphy, P ;
Sandkvist, M ;
Bagdasarian, M ;
Holl, WGJ .
JOURNAL OF MOLECULAR BIOLOGY, 2005, 348 (04) :845-855
[2]  
[Anonymous], DTSCH ARZTEBL
[3]   Inferring interaction partners from protein sequences [J].
Bitbol, Anne-Florence ;
Dwyer, Robert S. ;
Colwell, Lucy J. ;
Wingreen, Ned S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (43) :12180-12185
[4]  
Braun P, 2009, NAT METHODS, V6, P91, DOI [10.1038/NMETH.1281, 10.1038/nmeth.1281]
[5]   Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method [J].
Burger, Lukas ;
van Nimwegen, Erik .
MOLECULAR SYSTEMS BIOLOGY, 2008, 4 (1)
[6]  
Cocco S, 2017, ARXIV170301222
[7]   From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction [J].
Cocco, Simona ;
Monasson, Remi ;
Weigt, Martin .
PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (08)
[8]   CoPAP: Coevolution of Presence-Absence Patterns [J].
Cohen, Ofir ;
Ashkenazy, Haim ;
Karin, Eli Levy ;
Burstein, David ;
Pupko, Tal .
NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) :W232-W237
[9]   Uncovering the co-evolutionary network among prokaryotic genes [J].
Cohen, Ofir ;
Ashkenazy, Haim ;
Burstein, David ;
Pupko, Tal .
BIOINFORMATICS, 2012, 28 (18) :I389-I394
[10]   Inference of Gain and Loss Events from Phyletic Patterns Using Stochastic Mapping and Maximum Parsimony-A Simulation Study [J].
Cohen, Ofir ;
Pupko, Tal .
GENOME BIOLOGY AND EVOLUTION, 2011, 3 :1265-1275