NeBcon: protein contact map prediction using neural network training coupled with naiive Bayes classifiers

被引:57
作者
He, Baoji [1 ,2 ,3 ]
Mortuza, S. M. [3 ]
Wang, Yanting [1 ,2 ]
Shen, Hong-Bin [3 ,4 ]
Zhang, Yang [3 ,5 ]
机构
[1] Chinese Acad Sci, Inst Theoret Phys, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Phys Sci, Beijing 100049, Peoples R China
[3] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[4] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China
[5] Univ Michigan, Dept Biol Chem, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
CORRELATED MUTATIONS; RESIDUE CONTACTS; SEQUENCE; COEVOLUTION; ALIGNMENTS; INFORMATION; SEARCH; SERVER;
D O I
10.1093/bioinformatics/btx164
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction. Results: We developed a new pipeline, NeBcon, which uses the naiive Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles.
引用
收藏
页码:2296 / 2306
页数:11
相关论文
共 39 条
  • [11] MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins
    Jones, David T.
    Singh, Tanya
    Kosciolek, Tomasz
    Tetchner, Stuart
    [J]. BIOINFORMATICS, 2015, 31 (07) : 999 - 1006
  • [12] PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments
    Jones, David T.
    Buchan, Daniel W. A.
    Cozzetto, Domenico
    Pontil, Massimiliano
    [J]. BIOINFORMATICS, 2012, 28 (02) : 184 - 190
  • [13] DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES
    KABSCH, W
    SANDER, C
    [J]. BIOPOLYMERS, 1983, 22 (12) : 2577 - 2637
  • [14] FreeContact: fast and free software for protein contact prediction from residue co-evolution
    Kajan, Laszlo
    Hopf, Thomas A.
    Kalas, Matus
    Marks, Debora S.
    Rost, Burkhard
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [15] Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era
    Kamisetty, Hetunandan
    Ovchinnikov, Sergey
    Baker, David
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (39) : 15674 - 15679
  • [16] Evaluation of free modeling targets in CASP11 and ROLL
    Kinch, Lisa N.
    Li, Wenlin
    Monastyrskyy, Bohdan
    Kryshtafovych, Andriy
    Grishin, Nick V.
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2016, 84 : 51 - 66
  • [17] Automated Procedure for Contact-Map-Based Protein Structure Reconstruction
    Konopka, Bogumil M.
    Ciombor, Marika
    Kurczynska, Monika
    Kotulska, Malgorzata
    [J]. JOURNAL OF MEMBRANE BIOLOGY, 2014, 247 (05) : 409 - 420
  • [18] Protein 3D Structure Computed from Evolutionary Sequence Variation
    Marks, Debora S.
    Colwell, Lucy J.
    Sheridan, Robert
    Hopf, Thomas A.
    Pagnani, Andrea
    Zecchina, Riccardo
    Sander, Chris
    [J]. PLOS ONE, 2011, 6 (12):
  • [19] New encouraging developments in contact prediction: Assessment of the CASP11 results
    Monastyrskyy, Bohdan
    D'Andrea, Daniel
    Fidelis, Krzysztof
    Tramontano, Anna
    Kryshtafovych, Andriy
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2016, 84 : 131 - 144
  • [20] Evaluation of residue-residue contact prediction in CASP10
    Monastyrskyy, Bohdan
    D'Andrea, Daniel
    Fidelis, Krzysztof
    Tramontano, Anna
    Kryshtafovych, Andriy
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2014, 82 : 138 - 153