Characterizing and explaining the impact of disease-associated mutations in proteins without known structures or structural homologs

被引:17
作者
Sen, Neeladri
Anishchenko, Ivan
Bordin, Nicola
Sillitoe, Ian
Velankar, Sameer
Baker, David [1 ,2 ]
Orengo, Christine [3 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
[2] Inst Prot Design, Seattle, WA USA
[3] UCL, Bioinformat, London, England
基金
英国生物技术与生命科学研究理事会; 美国国家科学基金会;
关键词
protein structure modeling; mutation; AlphaFold; RoseTTAFold; disease-associated; functional site; WEB SERVER; PREDICTION; CLASSIFICATION; DOMAIN; DATABASE; FAMILY; IMPROVEMENTS; ACCURATE; REGIONS; CATH;
D O I
10.1093/bib/bbac187
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques, such as RoseTTAFold and AlphaFold, we can predict the structure of proteins even in the absence of structural homologs. We modeled and extracted the domains from 553 disease-associated human proteins without known protein structures or close homologs in the Protein Databank. We noticed that the model quality was higher and the Root mean square deviation (RMSD) lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could only be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces and conserved residues in these predicted structures. We then explored whether the disease-associated missense mutations were in the proximity of these predicted functional sites, whether they destabilized the protein structure based on ddG calculations or whether they were predicted to be pathogenic. We could explain 80% of these disease-associated mutations based on proximity to functional sites, structural destabilization or pathogenicity. When compared to polymorphisms, a larger percentage of disease-associated missense mutations were buried, closer to predicted functional sites, predicted as destabilizing and pathogenic. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.
引用
收藏
页数:15
相关论文
共 121 条
[1]   DoCM: a database of curated mutations in cancer [J].
Ainscough, Benjamin J. ;
Griffith, Malachi ;
Coffman, Adam C. ;
Wagner, Alex H. ;
Kunisaki, Jason ;
Choudhary, Mayank N. K. ;
McMichael, Joshua F. ;
Fulton, Robert S. ;
Wilson, Richard K. ;
Griffith, Obi L. ;
Mardis, Elaine R. .
NATURE METHODS, 2016, 13 (10) :806-807
[2]  
Akdel M., 2021, STRUCTURAL BIOL COMM
[3]   Homology modeling of the human microsomal glucose 6-phosphate transporter explains the mutations that cause the glycogen storage disease type Ib [J].
Almqvist, J ;
Huang, YF ;
Hovmöller, S ;
Wang, DN .
BIOCHEMISTRY, 2004, 43 (29) :9289-9297
[4]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[5]   PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[6]   Protein tertiary structure prediction and refinement using deep learning and Rosetta in CASP14 [J].
Anishchenko, Ivan ;
Baek, Minkyung ;
Park, Hahnbeom ;
Hiranuma, Naozumi ;
Kim, David E. ;
Dauparas, Justas ;
Mansoor, Sanaa ;
Humphreys, Ian R. ;
Baker, David .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2021, 89 (12) :1722-1733
[7]   Origins of coevolution between residues distant in protein 3D structures [J].
Anishchenko, Ivan ;
Ovchinnikov, Sergey ;
Kamisetty, Hetunandan ;
Baker, David .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (34) :9122-9127
[8]   A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations [J].
Ashford, Paul ;
Pang, Camilla S. M. ;
Moya-Garcia, Aurelio A. ;
Adeyelu, Tolulope ;
Orengo, Christine A. .
SCIENTIFIC REPORTS, 2019, 9 (1)
[9]   Disease-Causing Mutations and Rearrangements in Long Non-coding RNA Gene Loci [J].
Aznaourova, Marina ;
Schmerer, Nils ;
Schmeck, Bernd ;
Schulte, Leon N. .
FRONTIERS IN GENETICS, 2020, 11
[10]   Accurate prediction of protein structures and interactions using a three-track neural network [J].
Baek, Minkyung ;
DiMaio, Frank ;
Anishchenko, Ivan ;
Dauparas, Justas ;
Ovchinnikov, Sergey ;
Lee, Gyu Rie ;
Wang, Jue ;
Cong, Qian ;
Kinch, Lisa N. ;
Schaeffer, R. Dustin ;
Millan, Claudia ;
Park, Hahnbeom ;
Adams, Carson ;
Glassman, Caleb R. ;
DeGiovanni, Andy ;
Pereira, Jose H. ;
Rodrigues, Andria V. ;
van Dijk, Alberdina A. ;
Ebrecht, Ana C. ;
Opperman, Diederik J. ;
Sagmeister, Theo ;
Buhlheller, Christoph ;
Pavkov-Keller, Tea ;
Rathinaswamy, Manoj K. ;
Dalwadi, Udit ;
Yip, Calvin K. ;
Burke, John E. ;
Garcia, K. Christopher ;
Grishin, Nick V. ;
Adams, Paul D. ;
Read, Randy J. ;
Baker, David .
SCIENCE, 2021, 373 (6557) :871-+