Distance-based protein folding powered by deep learning

被引:269
作者
Xu, Jinbo [1 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
protein folding; deep learning; protein contact prediction; protein distance prediction; direct coupling analysis; RESIDUE-RESIDUE CONTACTS; STRUCTURE PREDICTION; COUPLING ANALYSIS; I-TASSER; SEQUENCE; COEVOLUTION; MAPS;
D O I
10.1073/pnas.1821309116
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with similar to 60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.
引用
收藏
页码:16856 / 16865
页数:10
相关论文
共 50 条
[1]   DNCON2: improved protein contact prediction using two-level deep convolutional neural networks [J].
Adhikari, Badri ;
Hou, Jie ;
Cheng, Jianlin .
BIOINFORMATICS, 2018, 34 (09) :1466-1472
[2]   CONFOLD: Residue-residue contact-guided ab initio protein folding [J].
Adhikari, Badri ;
Bhattacharya, Debswapna ;
Cao, Renzhi ;
Cheng, Jianlin .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2015, 83 (08) :1436-1449
[3]   End-to-End Differentiable Learning of Protein Structure [J].
AlQuraishi, Mohammed .
CELL SYSTEMS, 2019, 8 (04) :292-+
[4]   GLOBAL FOLD DETERMINATION FROM A SMALL NUMBER OF DISTANCE RESTRAINTS [J].
ASZODI, A ;
GRADWELL, MJ ;
TAYLOR, WR .
JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (02) :308-326
[5]   Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners [J].
Baldassi, Carlo ;
Zamparo, Marco ;
Feinauer, Christoph ;
Procaccini, Andrea ;
Zecchina, Riccardo ;
Weigt, Martin ;
Pagnani, Andrea .
PLOS ONE, 2014, 9 (03)
[6]   Version 1.2 of the Crystallography and NMR system [J].
Brunger, Axel T. .
NATURE PROTOCOLS, 2007, 2 (11) :2728-2733
[7]   Emerging methods in protein co-evolution [J].
de Juan, David ;
Pazos, Florencio ;
Valencia, Alfonso .
NATURE REVIEWS GENETICS, 2013, 14 (04) :249-261
[8]   Deep architectures for protein contact map prediction [J].
Di Lena, Pietro ;
Nagata, Ken ;
Baldi, Pierre .
BIOINFORMATICS, 2012, 28 (19) :2449-2457
[9]   Predicting protein residue-residue contacts using deep networks and boosting [J].
Eickholt, Jesse ;
Cheng, Jianlin .
BIOINFORMATICS, 2012, 28 (23) :3066-3072
[10]   Tools for comparative protein structure modeling and analysis [J].
Eswar, N ;
John, B ;
Mirkovic, N ;
Fiser, A ;
Ilyin, VA ;
Pieper, U ;
Stuart, AC ;
Marti-Renom, MA ;
Madhusudhan, MS ;
Yerkovich, B ;
Sali, A .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3375-3380