Deep-learning contact-map guided protein structure prediction in CASP13

被引:132
作者
Zheng, Wei [1 ]
Li, Yang [1 ,2 ]
Zhang, Chengxin [1 ]
Pearce, Robin [1 ]
Mortuza, S. M. [1 ]
Zhang, Yang [1 ,3 ]
机构
[1] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Jiangsu, Peoples R China
[3] Univ Michigan, Dept Biol Chem, Ann Arbor, MI 48109 USA
关键词
ab initio folding; CASP13; contact prediction; deep convolutional neural networks; deep multiple sequence alignment; protein structure prediction; I-TASSER; DOMAIN PREDICTION; SEQUENCE; SIMILARITY; SERVER;
D O I
10.1002/prot.25792
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
引用
收藏
页码:1149 / 1164
页数:16
相关论文
共 55 条
  • [1] Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods
    Abriata, Luciano A.
    Tamo, Giorgio E.
    Monastyrskyy, Bohdan
    Kryshtafovych, Andriy
    Dal Peraro, Matteo
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 : 97 - 112
  • [2] DNCON2: improved protein contact prediction using two-level deep convolutional neural networks
    Adhikari, Badri
    Hou, Jie
    Cheng, Jianlin
    [J]. BIOINFORMATICS, 2018, 34 (09) : 1466 - 1472
  • [3] Origins of coevolution between residues distant in protein 3D structures
    Anishchenko, Ivan
    Ovchinnikov, Sergey
    Kamisetty, Hetunandan
    Baker, David
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (34) : 9122 - 9127
  • [4] [Anonymous], 11 COMM WID EXP CRIT
  • [5] A METHOD TO IDENTIFY PROTEIN SEQUENCES THAT FOLD INTO A KNOWN 3-DIMENSIONAL STRUCTURE
    BOWIE, JU
    LUTHY, R
    EISENBERG, D
    [J]. SCIENCE, 1991, 253 (5016) : 164 - 170
  • [6] Improved protein contact predictions with the MetaPSICOV2 server in CASP12
    Buchan, Daniel W. A.
    Jones, David T.
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 : 78 - 83
  • [7] NeBcon: protein contact map prediction using neural network training coupled with naiive Bayes classifiers
    He, Baoji
    Mortuza, S. M.
    Wang, Yanting
    Shen, Hong-Bin
    Zhang, Yang
    [J]. BIOINFORMATICS, 2017, 33 (15) : 2296 - 2306
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Hidden Markov model speed heuristic and iterative HMM search procedure
    Johnson, L. Steven
    Eddy, Sean R.
    Portugaly, Elon
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [10] High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features
    Jones, David T.
    Kandathil, Shaun M.
    [J]. BIOINFORMATICS, 2018, 34 (19) : 3308 - 3315