Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14

被引:34
作者
Anishchenko, Ivan [1 ,2 ]
Baek, Minkyung [1 ,2 ]
Park, Hahnbeom [1 ,2 ]
Hiranuma, Naozumi [1 ,2 ,3 ]
Kim, David E. [1 ,2 ,4 ]
Dauparas, Justas [1 ,2 ]
Mansoor, Sanaa [1 ,2 ]
Humphreys, Ian R. [1 ,2 ]
Baker, David [1 ,2 ,4 ]
机构
[1] Univ Washington, Dept Biochem, Seattle, WA 98195 USA
[2] Univ Washington, Inst Prot Design, Seattle, WA 98195 USA
[3] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[4] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
deep learning; metagenomes; protein structure prediction; refinement; Rosetta; POTENTIALS; ENABLES;
D O I
10.1002/prot.26194
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The trRosetta structure prediction method employs deep learning to generate predicted residue-residue distance and orientation distributions from which 3D models are built. We sought to improve the method by incorporating as inputs (in addition to sequence information) both language model embeddings and template information weighted by sequence similarity to the target. We also developed a refinement pipeline that recombines models generated by template-free and template utilizing versions of trRosetta guided by the DeepAccNet accuracy predictor. Both benchmark tests and CASP results show that the new pipeline is a considerable improvement over the original trRosetta, and it is faster and requires less computing resources, completing the entire modeling process in a median < 3 h in CASP14. Our human group improved results with this pipeline primarily by identifying additional homologous sequences for input into the network. We also used the DeepAccNet accuracy predictor to guide Rosetta high-resolution refinement for submissions in the regular and refinement categories; although performance was quite good on a CASP relative scale, the overall improvements were rather modest in part due to missing inter-domain or inter-chain contacts.
引用
收藏
页码:1722 / 1733
页数:12
相关论文
共 29 条
[1]   IMG/M: integrated genome and metagenome comparative data analysis system [J].
Chen, I-Min A. ;
Markowitz, Victor M. ;
Chu, Ken ;
Palaniappan, Krishna ;
Szeto, Ernest ;
Pillay, Manoj ;
Ratner, Anna ;
Huang, Jinghua ;
Andersen, Evan ;
Huntemann, Marcel ;
Varghese, Neha ;
Hadjithomas, Michalis ;
Tennessen, Kristin ;
Nielsen, Torben ;
Ivanova, Natalia N. ;
Kyrpides, Nikos C. .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D507-D516
[2]   Relaxation of backbone bond geometry improves protein energy landscape modeling [J].
Conway, Patrick ;
Tyka, Michael D. ;
DiMaio, Frank ;
Konerding, David E. ;
Baker, David .
PROTEIN SCIENCE, 2014, 23 (01) :47-55
[3]   Accelerated Profile HMM Searches [J].
Eddy, Sean R. .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
[4]  
Elnaggar A, 2020, IEEE T PATTERN ANAL, DOI [DOI 10.1109/TPAMI.2021.3095381, 10.1101/2020.07.12.199554]
[5]   Deep learning enables the atomic structure determination of the Fanconi Anemia core complex from cryoEM [J].
Farrell, Daniel P. ;
Anishchenko, Ivan ;
Shakeel, Shabih ;
Lauko, Anna ;
Passmore, Lori A. ;
Baker, David ;
DiMaio, Frank .
IUCRJ, 2020, 7 :881-892
[6]   Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints [J].
Greener, Joe G. ;
Kandathil, Shaun M. ;
Jones, David T. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[7]   Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12 [J].
Haas, Jurgen ;
Barbato, Alessandro ;
Behringer, Dario ;
Studer, Gabriel ;
Roth, Steven ;
Bertoni, Martino ;
Mostaguir, Khaled ;
Gumienny, Rafal ;
Schwede, Torsten .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 :387-398
[8]   Improved protein structure refinement guided by deep learning based accuracy estimation [J].
Hiranuma, Naozumi ;
Park, Hahnbeom ;
Baek, Minkyung ;
Anishchenko, Ivan ;
Dauparas, Justas ;
Baker, David .
NATURE COMMUNICATIONS, 2021, 12 (01)
[9]   Prediction of interresidue contacts with DeepMetaPSICOV in CASP13 [J].
Kandathil, Shaun M. ;
Greener, Joe G. ;
Jones, David T. .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2019, 87 (12) :1092-1099
[10]   Clustal W and clustal X version 2.0 [J].
Larkin, M. A. ;
Blackshields, G. ;
Brown, N. P. ;
Chenna, R. ;
McGettigan, P. A. ;
McWilliam, H. ;
Valentin, F. ;
Wallace, I. M. ;
Wilm, A. ;
Lopez, R. ;
Thompson, J. D. ;
Gibson, T. J. ;
Higgins, D. G. .
BIOINFORMATICS, 2007, 23 (21) :2947-2948