Protein structure prediction using deep learning distance and hydrogen-bonding restraints in CASP14

被引:47
作者
Zheng, Wei [1 ]
Li, Yang [1 ,2 ]
Zhang, Chengxin [1 ]
Zhou, Xiaogen [1 ]
Pearce, Robin [1 ]
Bell, Eric W. [1 ]
Huang, Xiaoqiang [1 ]
Zhang, Yang [1 ,3 ]
机构
[1] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[2] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China
[3] Univ Michigan, Dept Biol Chem, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会;
关键词
ab initio folding; CASP14; deep learning; domain partition; multiple sequence alignment; protein structure prediction; residue-residue distance prediction; FOLD-RECOGNITION; I-TASSER; SIMILARITY; SERVER;
D O I
10.1002/prot.26193
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In this article, we report 3D structure prediction results by two of our best server groups ("Zhang-Server" and "QUARK") in CASP14. These two servers were built based on the D-I-TASSER and D-QUARK algorithms, which integrated four newly developed components into the classical protein folding pipelines, I-TASSER and QUARK, respectively. The new components include: (a) a new multiple sequence alignment (MSA) collection tool, DeepMSA2, which is extended from the DeepMSA program; (b) a contact-based domain boundary prediction algorithm, FUpred, to detect protein domain boundaries; (c) a residual convolutional neural network-based method, DeepPotential, to predict multiple spatial restraints by co-evolutionary features derived from the MSA; and (d) optimized spatial restraint energy potentials to guide the structure assembly simulations. For 37 FM targets, the average TM-scores of the first models produced by D-I-TASSER and D-QUARK were 96% and 112% higher than those constructed by I-TASSER and QUARK, respectively. The data analysis indicates noticeable improvements produced by each of the four new components, especially for the newly added spatial restraints from DeepPotential and the well-tuned force field that combines spatial restraints, threading templates, and generic knowledge-based potentials. However, challenges still exist in the current pipelines. These include difficulties in modeling multi-domain proteins due to low accuracy in inter-domain distance prediction and modeling protein domains from oligomer complexes, as the co-evolutionary analysis cannot distinguish inter-chain and intra-chain distances. Specifically tuning the deep learning-based predictors for multi-domain targets and protein complexes may be helpful to address these issues.
引用
收藏
页码:1734 / 1751
页数:18
相关论文
共 55 条
[1]   Eigen THREADER: analogous protein fold recognition by efficient contact map threading [J].
Buchan, Daniel W. A. ;
Jones, David T. .
BIOINFORMATICS, 2017, 33 (17) :2684-2690
[2]   IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes [J].
Chen, I-Min A. ;
Chu, Ken ;
Palaniappan, Krishna ;
Pillay, Manoj ;
Ratner, Anna ;
Huang, Jinghua ;
Huntemann, Marcel ;
Varghese, Neha ;
White, James R. ;
Seshadri, Rekha ;
Smirnova, Tatyana ;
Kirton, Edward ;
Jungbluth, Sean P. ;
Woyke, Tanja ;
Eloe-Fadrosh, Emiley A. ;
Ivanova, Natalia N. ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D666-D677
[3]   Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models [J].
Ekeberg, Magnus ;
Lovkvist, Cecilia ;
Lan, Yueheng ;
Weigt, Martin ;
Aurell, Erik .
PHYSICAL REVIEW E, 2013, 87 (01)
[4]   NeBcon: protein contact map prediction using neural network training coupled with naiive Bayes classifiers [J].
He, Baoji ;
Mortuza, S. M. ;
Wang, Yanting ;
Shen, Hong-Bin ;
Zhang, Yang .
BIOINFORMATICS, 2017, 33 (15) :2296-2306
[5]  
He K, P IEEE C COMP VIS PA, P770, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[6]   FASPR: an open-source tool for fast and accurate protein side-chain packing [J].
Huang, Xiaoqiang ;
Pearce, Robin ;
Zhang, Yang .
BIOINFORMATICS, 2020, 36 (12) :3758-3765
[7]   Critical assessment of methods of protein structure prediction (CASP)-Round XIII [J].
Kryshtafovych, Andriy ;
Schwede, Torsten ;
Topf, Maya ;
Fidelis, Krzysztof ;
Moult, John .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2019, 87 (12) :1011-1020
[8]   Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks [J].
Li, Yang ;
Zhang, Chengxin ;
Bell, Eric W. ;
Zheng, Wei ;
Zhou, Xiaogen ;
Yu, Dong-Jun ;
Zhang, Yang ;
Kolodny, Rachel .
PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (03)
[9]   Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13 [J].
Li, Yang ;
Zhang, Chengxin ;
Bell, Eric W. ;
Yu, Dong-Jun ;
Zhang, Yang .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2019, 87 (12) :1082-1091
[10]   ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks [J].
Li, Yang ;
Hu, Jun ;
Zhang, Chengxin ;
Yu, Dong-Jun ;
Zhang, Yang .
BIOINFORMATICS, 2019, 35 (22) :4647-4655