AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

被引:41
作者
Bordin, Nicola [1 ]
Sillitoe, Ian [1 ]
Nallapareddy, Vamsi [1 ]
Rauer, Clemens [1 ]
Lam, Su Datt [1 ,2 ]
Waman, Vaishali P. [1 ]
Sen, Neeladri [1 ]
Heinzinger, Michael [3 ]
Littmann, Maria [3 ]
Kim, Stephanie [4 ,5 ]
Velankar, Sameer [6 ]
Steinegger, Martin [4 ,5 ]
Rost, Burkhard [3 ,7 ,8 ]
Orengo, Christine [1 ]
机构
[1] UCL, Inst Struct & Mol Biol, London WC1E 6BT, England
[2] Univ Kebangsaan Malaysia, Fac Sci & Technol, Dept Appl Phys, Bangi 43600, Selangor, Malaysia
[3] TUM Tech Univ Munich, Dept Informat Bioinformat & Computat Biol, i12,Boltzmannstr 3, D-85748 Munich, Germany
[4] Seoul Natl Univ, Sch Biol Sci, Seoul, South Korea
[5] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
[6] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
[7] Inst Adv Study TUM IAS, Lichtenbergstr 2a, D-85748 Munich, Germany
[8] Alte Akad 8, TUM Sch Life Sci Weihenstephan WZW, Freising Weihenstephan, Germany
基金
英国生物技术与生命科学研究理事会; 新加坡国家研究基金会; 英国惠康基金;
关键词
CLASSIFICATION; SEQUENCE; PREDICTION; CATH; DATABASE; IMPACT;
D O I
10.1038/s42003-023-04488-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of similar to 370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
引用
收藏
页数:12
相关论文
共 56 条
[51]   AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models [J].
Varadi, Mihaly ;
Anyango, Stephen ;
Deshpande, Mandar ;
Nair, Sreenath ;
Natassia, Cindy ;
Yordanova, Galabina ;
Yuan, David ;
Stroe, Oana ;
Wood, Gemma ;
Laydon, Agata ;
Zidek, Augustin ;
Green, Tim ;
Tunyasuvunakool, Kathryn ;
Petersen, Stig ;
Jumper, John ;
Clancy, Ellen ;
Green, Richard ;
Vora, Ankur ;
Lutfi, Mira ;
Figurnov, Michael ;
Cowie, Andrew ;
Hobbs, Nicole ;
Kohli, Pushmeet ;
Kleywegt, Gerard ;
Birney, Ewan ;
Hassabis, Demis ;
Velankar, Sameer .
NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) :D439-D444
[52]   Improved protein structure prediction by deep learning irrespective of co-evolution information [J].
Xu, Jinbo ;
McPartlon, Matthew ;
Li, Jin .
NATURE MACHINE INTELLIGENCE, 2021, 3 (07) :601-+
[53]   Distance-based protein folding powered by deep learning [J].
Xu, Jinbo .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (34) :16856-16865
[54]   TM-align: a protein structure alignment algorithm based on the TM-score [J].
Zhang, Y ;
Skolnick, J .
NUCLEIC ACIDS RESEARCH, 2005, 33 (07) :2302-2309
[55]   Progressive assembly of multi-domain protein structures from cryo-EM density maps [J].
Zhou, Xiaogen ;
Li, Yang ;
Zhang, Chengxin ;
Zheng, Wei ;
Zhang, Guijun ;
Zhang, Yang .
NATURE COMPUTATIONAL SCIENCE, 2022, 2 (04) :265-+
[56]   Assembling multidomain protein structures through analogous global structural alignments [J].
Zhou, Xiaogen ;
Hu, Jun ;
Zhang, Chengxin ;
Zhang, Guijun ;
Zhang, Yang .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) :15930-15938