Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning

被引:10
作者
Adhikari, Badri [1 ]
Hou, Jie [2 ]
Cheng, Jianlin [2 ]
机构
[1] Univ Missouri, Dept Math & Comp Sci, Columbia, MO USA
[2] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO 65211 USA
关键词
CASP; coevolution; deep learning; machine learning; multiple sequence alignment; protein contact prediction; RESIDUE-RESIDUE CONTACTS; RECONSTRUCTION; NETWORKS; MAPS;
D O I
10.1002/prot.25405
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66.
引用
收藏
页码:84 / 96
页数:13
相关论文
共 23 条
[11]   PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments [J].
Jones, David T. ;
Buchan, Daniel W. A. ;
Cozzetto, Domenico ;
Pontil, Massimiliano .
BIOINFORMATICS, 2012, 28 (02) :184-190
[12]   FreeContact: fast and free software for protein contact prediction from residue co-evolution [J].
Kajan, Laszlo ;
Hopf, Thomas A. ;
Kalas, Matus ;
Marks, Debora S. ;
Rost, Burkhard .
BMC BIOINFORMATICS, 2014, 15
[13]   Evaluation of free modeling targets in CASP11 and ROLL [J].
Kinch, Lisa N. ;
Li, Wenlin ;
Monastyrskyy, Bohdan ;
Kryshtafovych, Andriy ;
Grishin, Nick V. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2016, 84 :51-66
[14]  
Kosciolek Tomasz., 2015, Proteins: Structure, Function and Bioinformatics
[15]   Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta [J].
Ovchinnikov, Sergey ;
Kim, David E. ;
Wang, Ray Yu-Ruei ;
Liu, Yuan ;
DiMaio, Frank ;
Baker, David .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2016, 84 :67-75
[16]  
Remmert M, 2012, NAT METHODS, V9, P173, DOI [10.1038/NMETH.1818, 10.1038/nmeth.1818]
[17]   CCMpred-fast and precise prediction of protein residue-residue contacts from correlated mutations [J].
Seemayer, Stefan ;
Gruber, Markus ;
Soeding, Johannes .
BIOINFORMATICS, 2014, 30 (21) :3128-3130
[18]   Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns [J].
Skwark, Marcin J. ;
Raimondi, Daniele ;
Michel, Mirco ;
Elofsson, Arne .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (11)
[19]   FT-COMAR: fault tolerant three-dimensional structure reconstruction from protein contact maps [J].
Vassura, Marco ;
Margara, Luciano ;
Di Lena, Pietro ;
Medri, Filippo ;
Fariselli, Piero ;
Casadio, Rita .
BIOINFORMATICS, 2008, 24 (10) :1313-1315
[20]   Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model [J].
Wang, Sheng ;
Sun, Siqi ;
Li, Zhen ;
Zhang, Renyu ;
Xu, Jinbo .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (01)