AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms

被引:41
作者
Bordin, Nicola [1 ]
Sillitoe, Ian [1 ]
Nallapareddy, Vamsi [1 ]
Rauer, Clemens [1 ]
Lam, Su Datt [1 ,2 ]
Waman, Vaishali P. [1 ]
Sen, Neeladri [1 ]
Heinzinger, Michael [3 ]
Littmann, Maria [3 ]
Kim, Stephanie [4 ,5 ]
Velankar, Sameer [6 ]
Steinegger, Martin [4 ,5 ]
Rost, Burkhard [3 ,7 ,8 ]
Orengo, Christine [1 ]
机构
[1] UCL, Inst Struct & Mol Biol, London WC1E 6BT, England
[2] Univ Kebangsaan Malaysia, Fac Sci & Technol, Dept Appl Phys, Bangi 43600, Selangor, Malaysia
[3] TUM Tech Univ Munich, Dept Informat Bioinformat & Computat Biol, i12,Boltzmannstr 3, D-85748 Munich, Germany
[4] Seoul Natl Univ, Sch Biol Sci, Seoul, South Korea
[5] Seoul Natl Univ, Artificial Intelligence Inst, Seoul, South Korea
[6] European Bioinformat Inst, European Mol Biol Lab, Hinxton, England
[7] Inst Adv Study TUM IAS, Lichtenbergstr 2a, D-85748 Munich, Germany
[8] Alte Akad 8, TUM Sch Life Sci Weihenstephan WZW, Freising Weihenstephan, Germany
基金
英国生物技术与生命科学研究理事会; 新加坡国家研究基金会; 英国惠康基金;
关键词
CLASSIFICATION; SEQUENCE; PREDICTION; CATH; DATABASE; IMPACT;
D O I
10.1038/s42003-023-04488-9
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of similar to 370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
引用
收藏
页数:12
相关论文
共 56 条
[41]  
Schrodinger LLC., 2015, PYMOL MOL GRAPHICS S
[42]   Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs [J].
Sen, Neeladri ;
Anishchenko, Ivan ;
Bordin, Nicola ;
Sillitoe, Ian ;
Velankar, Sameer ;
Baker, David ;
Orengo, Christine .
BRIEFINGS IN BIOINFORMATICS, 2022, 23 (04)
[43]   CATH: increased structural coverage of functional space [J].
Sillitoe, Ian ;
Bordin, Nicola ;
Dawson, Natalie ;
Waman, Vaishali P. ;
Ashford, Paul ;
Scholes, Harry M. ;
Pang, Camilla S. M. ;
Woodridge, Laurel ;
Rauer, Clemens ;
Sen, Neeladri ;
Abbasian, Mahnaz ;
Le Cornu, Sean ;
Lam, Su Datt ;
Berka, Karel ;
Varekova, Ivana Hutarova ;
Svobodova, Radka ;
Lees, Jon ;
Orengo, Christine A. .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D266-D273
[44]   New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures [J].
Sillitoe, Ian ;
Cuff, Alison L. ;
Dessailly, Benoit H. ;
Dawson, Natalie L. ;
Furnham, Nicholas ;
Lee, David ;
Lees, Jonathan G. ;
Lewis, Tony E. ;
Studer, Romain A. ;
Rentzsch, Robert ;
Yeats, Corin ;
Thornton, Janet M. ;
Orengo, Christine A. .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D490-D498
[45]   UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches [J].
Suzek, Baris E. ;
Wang, Yuqi ;
Huang, Hongzhan ;
McGarvey, Peter B. ;
Wu, Cathy H. .
BIOINFORMATICS, 2015, 31 (06) :926-932
[46]   The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets [J].
Szklarczyk, Damian ;
Gable, Annika L. ;
Nastou, Katerina C. ;
Lyon, David ;
Kirsch, Rebecca ;
Pyysalo, Sampo ;
Doncheva, Nadezhda T. ;
Legeay, Marc ;
Fang, Tao ;
Bork, Peer ;
Jensen, Lars J. ;
von Mering, Christian .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D605-D612
[47]   Advances in structural genomics [J].
Teichmann, SA ;
Chothia, C ;
Gerstein, M .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 1999, 9 (03) :390-399
[48]   Scoring residue conservation [J].
Valdar, WSJ .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 48 (02) :227-241
[49]  
van Kempen M, 2022, bioRxiv, DOI [10.1101/2022.02.07.479398, 10.1101/2022.02.07.479398, DOI 10.1101/2022.02.07.479398, DOI 10.1101/2022.02]
[50]  
Varadi M., 2022, GIGASCIENCE, V11, pgiac118