Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

被引:64
作者
Iqbal, Sumaiya [1 ,2 ,3 ,4 ]
Perez-Palma, Eduardo [5 ]
Jespersen, Jakob B. [6 ]
May, Patrick [7 ]
Hoksza, David [7 ]
Heyne, Henrike O. [2 ,4 ,8 ]
Ahmed, Shehab S. [9 ]
Rifat, Zaara T. [9 ]
Rahman, M. Sohel [9 ]
Lage, Kasper [2 ,10 ]
Palotie, Aarno [2 ,3 ,9 ]
Cottrell, Jeffrey R. [2 ]
Wagner, Florence F. [1 ,2 ]
Daly, Mark J. [2 ,3 ,4 ,8 ,11 ]
Campbell, Arthur J. [1 ,2 ,11 ]
Lal, Dennis [1 ,2 ,5 ,10 ,11 ,12 ]
机构
[1] Broad Inst MIT & Harvard, Ctr Dev Therapeut, Cambridge, MA 02142 USA
[2] Broad Inst MIT & Harvard, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
[3] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02142 USA
[4] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[5] Tech Univ Denmark, Dept Bio & Hlth Informat, DK-2800 Lyngby, Denmark
[6] Univ Luxembourg, Luxembourg Ctr Syst Biomed, L-4365 Esch Sur Alzette, Luxembourg
[7] Charles Univ Prague, Dept Software Engn, Fac Math & Phys, Prague 11636, Czech Republic
[8] Univ Helsinki, Inst Mol Med Finland, Helsinki 00100, Finland
[9] Bangladesh Univ Engn & Technol, Comp Sci & Engn, Dhaka 105, Bangladesh
[10] Massachusetts Gen Hosp, Dept Surg, Boston, MA 02114 USA
[11] Univ Cologne, Cologne Ctr Genom, D-50931 Cologne, Germany
[12] Cleveland Clin, Neurol Inst, Epilepsy Ctr, Cleveland, OH 44195 USA
关键词
missense variant interpretation; protein structure and function; disease variation effect; 3D mutational hotspot; machine learning; DISEASE; MUTATIONS; SEQUENCE; PREDICTION; PATHOGENICITY; EVOLUTION; REGIONS; VERSION; SITES;
D O I
10.1073/pnas.2002660117
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations' positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants' pathogenicity in terms of the perturbed molecular mechanisms.
引用
收藏
页码:28201 / 28211
页数:11
相关论文
共 71 条
  • [21] Predicting functional effects of missense variants in voltage-gated sodium and calcium channels
    Heyne, Henrike O.
    Baez-Nieto, David
    Iqbal, Sumaiya
    Palmer, Duncan S.
    Brunklaus, Andreas
    May, Patrick
    Johannesen, Katrine M.
    Lauxmann, Stephan
    Lemke, Johannes R.
    Moller, Rikke S.
    Perez-Palma, Eduardo
    Scholl, Ute, I
    Syrbe, Steffen
    Lerche, Holger
    Lal, Dennis
    Campbell, Arthur J.
    Wang, Hao-Ran
    Pan, Jen
    Daly, Mark J.
    [J]. SCIENCE TRANSLATIONAL MEDICINE, 2020, 12 (556)
  • [22] Functional characterization of 3D protein structures informed by human genetic diversity
    Hicks, Michael
    Bartha, Istvan
    di Iulio, Julia
    Venter, J. Craig
    Telenti, Amalio
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (18) : 8960 - 8965
  • [23] PhosphoSitePlus, 2014: mutations, PTMs and recalibrations
    Hornbeck, Peter V.
    Zhang, Bin
    Murray, Beth
    Kornhauser, Jon M.
    Latham, Vaughan
    Skrzypek, Elzbieta
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D512 - D520
  • [24] MISCAST: MIssense variant to protein StruCture Analysis web SuiTe
    Iqbal, Sumaiya
    Hoksza, David
    Perez-Palma, Eduardo
    May, Patrick
    Jespersen, Jakob B.
    Ahmed, Shehab S.
    Rifat, Zaara T.
    Heyne, Henrike O.
    Rahman, M. Sohel
    Cottrell, Jeffrey R.
    Wagner, Florence F.
    Daly, Mark J.
    Campbell, Arthur J.
    Lal, Dennis
    [J]. NUCLEIC ACIDS RESEARCH, 2020, 48 (W1) : W132 - W139
  • [25] Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated?
    Ittisoponpisan, Sirawit
    Islam, Suhail A.
    Khanna, Tarun
    Alhuzimi, Eman
    David, Alessia
    Sternberg, Michael J. E.
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2019, 431 (11) : 2197 - 2212
  • [26] DICTIONARY OF PROTEIN SECONDARY STRUCTURE - PATTERN-RECOGNITION OF HYDROGEN-BONDED AND GEOMETRICAL FEATURES
    KABSCH, W
    SANDER, C
    [J]. BIOPOLYMERS, 1983, 22 (12) : 2577 - 2637
  • [27] Comprehensive assessment of cancer missense mutation clustering in protein structures
    Kamburov, Atanas
    Lawrence, Michael S.
    Polak, Paz
    Leshchiner, Ignaty
    Lage, Kasper
    Golub, Todd R.
    Lander, Eric S.
    Getz, Gad
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (40) : E5486 - E5495
  • [28] The mutational constraint spectrum quantified from variation in 141,456 humans
    Karczewski, Konrad J.
    Francioli, Laurent C.
    Tiao, Grace
    Cummings, Beryl B.
    Alfoldi, Jessica
    Wang, Qingbo
    Collins, Ryan L.
    Laricchia, Kristen M.
    Ganna, Andrea
    Birnbaum, Daniel P.
    Gauthier, Laura D.
    Brand, Harrison
    Solomonson, Matthew
    Watts, Nicholas A.
    Rhodes, Daniel
    Singer-Berk, Moriel
    England, Eleina M.
    Seaby, Eleanor G.
    Kosmicki, Jack A.
    Walters, Raymond K.
    Tashman, Katherine
    Farjoun, Yossi
    Banks, Eric
    Poterba, Timothy
    Wang, Arcturus
    Seed, Cotton
    Whiffin, Nicola
    Chong, Jessica X.
    Samocha, Kaitlin E.
    Pierce-Hoffman, Emma
    Zappala, Zachary
    O'Donnell-Luria, Anne H.
    Minikel, Eric Vallabh
    Weisburd, Ben
    Lek, Monkol
    Ware, James S.
    Vittal, Christopher
    Armean, Irina M.
    Bergelson, Louis
    Cibulskis, Kristian
    Connolly, Kristen M.
    Covarrubias, Miguel
    Donnelly, Stacey
    Ferriera, Steven
    Gabriel, Stacey
    Gentry, Jeff
    Gupta, Namrata
    Jeandet, Thibault
    Kaplan, Diane
    Llanwarne, Christopher
    [J]. NATURE, 2020, 581 (7809) : 434 - +
  • [29] A general framework for estimating the relative pathogenicity of human genetic variants
    Kircher, Martin
    Witten, Daniela M.
    Jain, Preti
    O'Roak, Brian J.
    Cooper, Gregory M.
    Shendure, Jay
    [J]. NATURE GENETICS, 2014, 46 (03) : 310 - +
  • [30] Knudsen Michael, 2010, Human Genomics, V4, P207