AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

被引:41
作者
Bordin, Nicola [1 ]
Sillitoe, Ian [1 ]
Nallapareddy, Vamsi [1 ]
Rauer, Clemens [1 ]
Lam, Su Datt [1 ,2 ]
Waman, Vaishali P. [1 ]
Sen, Neeladri [1 ]
Heinzinger, Michael [3 ]
Littmann, Maria [3 ]
Kim, Stephanie [4 ,5 ]
Velankar, Sameer [6 ]
Steinegger, Martin [4 ,5 ]
Rost, Burkhard [3 ,7 ,8 ]
Orengo, Christine [1 ]
机构
[1] UCL, Inst Struct & Mol Biol, London WC1E 6BT, England
[2] Univ Kebangsaan Malaysia, Fac Sci & Technol, Dept Appl Phys, Bangi 43600, Selangor, Malaysia
[3] TUM Tech Univ Munich, Dept Informat Bioinformat & Computat Biol, i12,Boltzmannstr 3, D-85748 Munich, Germany
[4] Seoul Natl Univ, Sch Biol Sci, Seoul, South Korea
[5] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
[6] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
[7] Inst Adv Study TUM IAS, Lichtenbergstr 2a, D-85748 Munich, Germany
[8] Alte Akad 8, TUM Sch Life Sci Weihenstephan WZW, Freising Weihenstephan, Germany
基金
英国生物技术与生命科学研究理事会; 新加坡国家研究基金会; 英国惠康基金;
关键词
CLASSIFICATION; SEQUENCE; PREDICTION; CATH; DATABASE; IMPACT;
D O I
10.1038/s42003-023-04488-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of similar to 370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
引用
收藏
页数:12
相关论文
共 56 条
[1]   SCOP2 prototype: a new approach to protein structure mining [J].
Andreeva, Antonina ;
Howorth, Dave ;
Chothia, Cyrus ;
Kulesha, Eugene ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D310-D314
[2]  
[Anonymous], CATH CLUST CATH TOOL
[3]   Accurate prediction of protein structures and interactions using a three-track neural network [J].
Baek, Minkyung ;
DiMaio, Frank ;
Anishchenko, Ivan ;
Dauparas, Justas ;
Ovchinnikov, Sergey ;
Lee, Gyu Rie ;
Wang, Jue ;
Cong, Qian ;
Kinch, Lisa N. ;
Schaeffer, R. Dustin ;
Millan, Claudia ;
Park, Hahnbeom ;
Adams, Carson ;
Glassman, Caleb R. ;
DeGiovanni, Andy ;
Pereira, Jose H. ;
Rodrigues, Andria V. ;
van Dijk, Alberdina A. ;
Ebrecht, Ana C. ;
Opperman, Diederik J. ;
Sagmeister, Theo ;
Buhlheller, Christoph ;
Pavkov-Keller, Tea ;
Rathinaswamy, Manoj K. ;
Dalwadi, Udit ;
Yip, Calvin K. ;
Burke, John E. ;
Garcia, K. Christopher ;
Grishin, Nick V. ;
Adams, Paul D. ;
Read, Randy J. ;
Baker, David .
SCIENCE, 2021, 373 (6557) :871-+
[4]   Using deep learning to annotate the protein universe [J].
Bileschi, Maxwell L. ;
Belanger, David ;
Bryant, Drew ;
Sanderson, Theo ;
Carter, Brandon ;
Sculley, D. ;
Bateman, Alex ;
DePristo, Mark A. ;
Colwell, Lucy J. .
NATURE BIOTECHNOLOGY, 2022, 40 (06) :932-+
[5]   SCOPe: improvements to the structural classification of proteins - extended database to facilitate variant interpretation and machine learning [J].
Chandonia, John-Marc ;
Guan, Lindsey ;
Lin, Shiangyi ;
Yu, Changhua ;
Fox, Naomi K. ;
Brenner, Steven E. .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D553-D559
[6]   ECOD: An Evolutionary Classification of Protein Domains [J].
Cheng, Hua ;
Schaeffer, R. Dustin ;
Liao, Yuxing ;
Kinch, Lisa N. ;
Pei, Jimin ;
Shi, Shuoyong ;
Kim, Bong-Hyun ;
Grishin, Nick V. .
PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (12)
[7]   THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS [J].
CHOTHIA, C ;
LESK, AM .
EMBO JOURNAL, 1986, 5 (04) :823-826
[8]   Biopython']python: freely available Python']Python tools for computational molecular biology and bioinformatics [J].
Cock, Peter J. A. ;
Antao, Tiago ;
Chang, Jeffrey T. ;
Chapman, Brad A. ;
Cox, Cymon J. ;
Dalke, Andrew ;
Friedberg, Iddo ;
Hamelryck, Thomas ;
Kauff, Frank ;
Wilczynski, Bartek ;
de Hoon, Michiel J. L. .
BIOINFORMATICS, 2009, 25 (11) :1422-1423
[9]   CATH functional families predict functional sites in proteins [J].
Das, Sayoni ;
Scholes, Harry M. ;
Sen, Neeladri ;
Orengo, Christine .
BIOINFORMATICS, 2021, 37 (08) :1099-1106
[10]   Functional classification of CATH superfamilies: a domain-based approach for protein function annotation [J].
Das, Sayoni ;
Lee, David ;
Sillitoe, Ian ;
Dawson, Natalie L. ;
Lees, Jonathan G. ;
Orengo, Christine A. .
BIOINFORMATICS, 2015, 31 (21) :3460-3467