Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

被引:64
作者
Iqbal, Sumaiya [1 ,2 ,3 ,4 ]
Perez-Palma, Eduardo [5 ]
Jespersen, Jakob B. [6 ]
May, Patrick [7 ]
Hoksza, David [7 ]
Heyne, Henrike O. [2 ,4 ,8 ]
Ahmed, Shehab S. [9 ]
Rifat, Zaara T. [9 ]
Rahman, M. Sohel [9 ]
Lage, Kasper [2 ,10 ]
Palotie, Aarno [2 ,3 ,9 ]
Cottrell, Jeffrey R. [2 ]
Wagner, Florence F. [1 ,2 ]
Daly, Mark J. [2 ,3 ,4 ,8 ,11 ]
Campbell, Arthur J. [1 ,2 ,11 ]
Lal, Dennis [1 ,2 ,5 ,10 ,11 ,12 ]
机构
[1] Broad Inst MIT & Harvard, Ctr Dev Therapeut, Cambridge, MA 02142 USA
[2] Broad Inst MIT & Harvard, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
[3] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02142 USA
[4] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[5] Tech Univ Denmark, Dept Bio & Hlth Informat, DK-2800 Lyngby, Denmark
[6] Univ Luxembourg, Luxembourg Ctr Syst Biomed, L-4365 Esch Sur Alzette, Luxembourg
[7] Charles Univ Prague, Dept Software Engn, Fac Math & Phys, Prague 11636, Czech Republic
[8] Univ Helsinki, Inst Mol Med Finland, Helsinki 00100, Finland
[9] Bangladesh Univ Engn & Technol, Comp Sci & Engn, Dhaka 105, Bangladesh
[10] Massachusetts Gen Hosp, Dept Surg, Boston, MA 02114 USA
[11] Univ Cologne, Cologne Ctr Genom, D-50931 Cologne, Germany
[12] Cleveland Clin, Neurol Inst, Epilepsy Ctr, Cleveland, OH 44195 USA
关键词
missense variant interpretation; protein structure and function; disease variation effect; 3D mutational hotspot; machine learning; DISEASE; MUTATIONS; SEQUENCE; PREDICTION; PATHOGENICITY; EVOLUTION; REGIONS; VERSION; SITES;
D O I
10.1073/pnas.2002660117
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.
引用
收藏
页码:28201 / 28211
页数:11
相关论文
共 71 条
  • [1] Abrusa G., 2016, PLOS COMPUT BIOL, V12
  • [2] A method and server for predicting damaging missense mutations
    Adzhubei, Ivan A.
    Schmidt, Steffen
    Peshkin, Leonid
    Ramensky, Vasily E.
    Gerasimova, Anna
    Bork, Peer
    Kondrashov, Alexey S.
    Sunyaev, Shamil R.
    [J]. NATURE METHODS, 2010, 7 (04) : 248 - 249
  • [3] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
  • [4] Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations
    Araya, Carlos L.
    Cenik, Can
    Reuters, Jason A.
    Kiss, Gert
    Pande, Vijay S.
    Snyder, Michael P.
    Greenleaf, William J.
    [J]. NATURE GENETICS, 2016, 48 (02) : 117 - 125
  • [5] SUMOylation of Pancreatic Glucokinase Regulates Its Cellular Stability and Activity
    Aukrust, Ingvild
    Bjorkhaug, Lise
    Negahdar, Maria
    Molnes, Janne
    Johansson, Bente B.
    Muller, Yvonne
    Haas, Wilhelm
    Gygi, Steven P.
    Sovik, Oddmund
    Flatmark, Torgeir
    Kulkarni, Rohit N.
    Njolstad, Pal R.
    [J]. JOURNAL OF BIOLOGICAL CHEMISTRY, 2013, 288 (08) : 5951 - 5962
  • [6] Robust classification of protein variation using structural modelling and large-scale data integration
    Baugh, Evan H.
    Simmons-Edler, Riley
    Mueller, Christian L.
    Alford, Rebecca F.
    Volfovsky, Natalia
    Lash, Alex E.
    Bonneau, Richard
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (06) : 2501 - 2513
  • [7] Beaglehole R., 1993, BASIC EPIDEMIOLOGY
  • [8] Berman HM, 2003, PROTEIN STRUCTURE: DETERMINATION, ANALYSIS, AND APPLICATIONS FOR DRUG DISCOVERY, P389
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Burkle A., 2001, Encyclopedia of Genetics, P1533