Distance-based protein folding powered by deep learning

被引：269

作者：

Xu, Jinbo ^{[1
]}

机构：

[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA

来源：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA | 2019年 / 116卷 / 34期

关键词：

protein folding; deep learning; protein contact prediction; protein distance prediction; direct coupling analysis; RESIDUE-RESIDUE CONTACTS; STRUCTURE PREDICTION; COUPLING ANALYSIS; I-TASSER; SEQUENCE; COEVOLUTION; MAPS;

D O I：

10.1073/pnas.1821309116

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Direct coupling analysis (DCA) for protein folding has made very good progress, but it is not effective for proteins that lack many sequence homologs, even coupled with time-consuming conformation sampling with fragments. We show that we can accurately predict interresidue distance distribution of a protein by deep learning, even for proteins with similar to 60 sequence homologs. Using only the geometric constraints given by the resulting distance matrix we may construct 3D models without involving extensive conformation sampling. Our method successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 h on a Linux computer of 20 central processing units. In contrast, DCA-predicted contacts cannot be used to fold any of these hard targets in the absence of extensive conformation sampling, and the best CASP12 group folded only 11 of them by integrating DCA-predicted contacts into fragment-based conformation sampling. Rigorous experimental validation in CASP13 shows that our distance-based folding server successfully folded 17 of 32 hard targets (with a median family size of 36 sequence homologs) and obtained 70% precision on the top L/5 long-range predicted contacts. The latest experimental validation in CAMEO shows that our server predicted correct folds for 2 membrane proteins while all of the other servers failed. These results demonstrate that it is now feasible to predict correct fold for many more proteins lack of similar structures in the Protein Data Bank even on a personal computer.

引用

页码：16856 / 16865

页数：10

共 50 条

[1] DNCON2: improved protein contact prediction using two-level deep convolutional neural networks [J].