AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

被引:41
作者
Bordin, Nicola [1 ]
Sillitoe, Ian [1 ]
Nallapareddy, Vamsi [1 ]
Rauer, Clemens [1 ]
Lam, Su Datt [1 ,2 ]
Waman, Vaishali P. [1 ]
Sen, Neeladri [1 ]
Heinzinger, Michael [3 ]
Littmann, Maria [3 ]
Kim, Stephanie [4 ,5 ]
Velankar, Sameer [6 ]
Steinegger, Martin [4 ,5 ]
Rost, Burkhard [3 ,7 ,8 ]
Orengo, Christine [1 ]
机构
[1] UCL, Inst Struct & Mol Biol, London WC1E 6BT, England
[2] Univ Kebangsaan Malaysia, Fac Sci & Technol, Dept Appl Phys, Bangi 43600, Selangor, Malaysia
[3] TUM Tech Univ Munich, Dept Informat Bioinformat & Computat Biol, i12,Boltzmannstr 3, D-85748 Munich, Germany
[4] Seoul Natl Univ, Sch Biol Sci, Seoul, South Korea
[5] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
[6] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
[7] Inst Adv Study TUM IAS, Lichtenbergstr 2a, D-85748 Munich, Germany
[8] Alte Akad 8, TUM Sch Life Sci Weihenstephan WZW, Freising Weihenstephan, Germany
基金
英国生物技术与生命科学研究理事会; 新加坡国家研究基金会; 英国惠康基金;
关键词
CLASSIFICATION; SEQUENCE; PREDICTION; CATH; DATABASE; IMPACT;
D O I
10.1038/s42003-023-04488-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of similar to 370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
引用
收藏
页数:12
相关论文
共 56 条
[11]   ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning [J].
Elnaggar, Ahmed ;
Heinzinger, Michael ;
Dallago, Christian ;
Rehawi, Ghalia ;
Wang, Yu ;
Jones, Llion ;
Gibbs, Tom ;
Feher, Tamas ;
Angerer, Christoph ;
Steinegger, Martin ;
Bhowmik, Debsindhu ;
Rost, Burkhard .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) :7112-7127
[12]  
Evans Richard., 2021, bioRxiv, DOI DOI 10.1101/2021.10.04.463034
[13]   Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints [J].
Greener, Joe G. ;
Kandathil, Shaun M. ;
Jones, David T. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[14]  
Gromiha M.M., 2019, Encyclopedia of Bioinformatics and Computational Biology, P445, DOI [10.1016/B978-0-12-809633-8.20278-1, 10.1016/b978-0-12-809633-8.20278-1, DOI 10.1016/B978-0-12-809633-8.20278-1]
[15]   Contrastive learning on protein embeddings enlightens midnight zone [J].
Heinzinger, Michael ;
Littmann, Maria ;
Sillitoe, Ian ;
Bordin, Nicola ;
Orengo, Christine ;
Rost, Burkhard .
NAR GENOMICS AND BIOINFORMATICS, 2022, 4 (02)
[16]   ConDo: protein domain boundary prediction using coevolutionary information [J].
Hong, Seung Hwan ;
Joo, Keehyoung ;
Lee, Jooyoung .
BIOINFORMATICS, 2019, 35 (14) :2411-2417
[17]   The impact of AlphaFold2 one year on [J].
Jones, David T. ;
Thornton, Janet M. .
NATURE METHODS, 2022, 19 (01) :15-20
[18]   Highly accurate protein structure prediction with AlphaFold [J].
Jumper, John ;
Evans, Richard ;
Pritzel, Alexander ;
Green, Tim ;
Figurnov, Michael ;
Ronneberger, Olaf ;
Tunyasuvunakool, Kathryn ;
Bates, Russ ;
Zidek, Augustin ;
Potapenko, Anna ;
Bridgland, Alex ;
Meyer, Clemens ;
Kohl, Simon A. A. ;
Ballard, Andrew J. ;
Cowie, Andrew ;
Romera-Paredes, Bernardino ;
Nikolov, Stanislav ;
Jain, Rishub ;
Adler, Jonas ;
Back, Trevor ;
Petersen, Stig ;
Reiman, David ;
Clancy, Ellen ;
Zielinski, Michal ;
Steinegger, Martin ;
Pacholska, Michalina ;
Berghammer, Tamas ;
Bodenstein, Sebastian ;
Silver, David ;
Vinyals, Oriol ;
Senior, Andrew W. ;
Kavukcuoglu, Koray ;
Kohli, Pushmeet ;
Hassabis, Demis .
NATURE, 2021, 596 (7873) :583-+
[19]   DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES [J].
KABSCH, W ;
SANDER, C .
BIOPOLYMERS, 1983, 22 (12) :2577-2637
[20]   Trends in structural coverage of the protein universe and the impact of the Protein Structure Initiative [J].
Khafizov, Kamil ;
Madrid-Aliste, Carlos ;
Almo, Steven C. ;
Fiser, Andras .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (10) :3733-3738