Comprehensive Analysis of Constraint on the Spatial Distribution of Missense Variants in Human Protein Structures

被引:46
作者
Sivley, R. Michael [1 ]
Dou, Xiaoyi [2 ]
Meiler, Jens [1 ,3 ,4 ]
Bush, William S. [5 ,6 ]
Capra, John A. [1 ,2 ,4 ,7 ,8 ]
机构
[1] Vanderbilt Univ, Dept Biomed Informat, Nashville, TN 37232 USA
[2] Vanderbilt Univ, Dept Comp Sci, Nashville, TN 37212 USA
[3] Vanderbilt Univ, Dept Chem, Nashville, TN 37212 USA
[4] Vanderbilt Univ, Struct Biol Ctr, Nashville, TN 37212 USA
[5] Case Western Reserve Univ, Dept Populat & Quantitat Hlth Sci, Cleveland, OH 44106 USA
[6] Case Western Reserve Univ, Inst Computat Biol, Cleveland, OH 44106 USA
[7] Vanderbilt Univ, Dept Biol Sci, Nashville, TN 37232 USA
[8] Vanderbilt Univ, Vanderbilt Genet Inst, Nashville, TN 37232 USA
关键词
MUTATIONS; REGIONS; CHALLENGES; DISCOVERY; SELECTION; RESIDUES; DOMINANT; SEQUENCE; PTPN11;
D O I
10.1016/j.ajhg.2018.01.017
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The spatial distribution of genetic variation within proteins is shaped by evolutionary constraint and provides insight into the functional importance of protein regions and the potential pathogenicity of protein alterations. Here, we comprehensively evaluate the 3D spatial patterns of human germline and somatic variation in 6,604 experimentally derived protein structures and 33,144 computationally derived homology models covering 77% of all human proteins. Using a systematic approach, we quantify differences in the spatial distributions of neutral germline variants, disease-causing germline variants, and recurrent somatic variants. Neutral missense variants exhibit a general trend toward spatial dispersion, which is driven by constraint on core residues. In contrast, germline disease-causing variants are generally clustered in protein structures and form clusters more frequently than recurrent somatic variants identified from tumor sequencing. In total, we identify 215 proteins with significant spatial constraints on the distribution of disease-causing missense variants in experimentally derived protein structures, only 65 (30%) of which have been previously reported. This analysis identifies many clusters not detectable from sequence information alone; only 12% of proteins with significant clustering in 3D were identified from similar analyses of linear protein sequence. Furthermore, spatial analyses of mutations in homology-based structural models are highly correlated with those from experimentally derived structures, supporting the use of computationally derived models. Our approach highlights significant differences in the spatial constraints on different classes of mutations in protein structure and identifies regions of potential function within individual proteins.
引用
收藏
页码:415 / 426
页数:12
相关论文
共 57 条
  • [1] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [2] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [3] Identification of significantly mutated regions across cancer types highlights a rich landscape of functional molecular alterations
    Araya, Carlos L.
    Cenik, Can
    Reuters, Jason A.
    Kiss, Gert
    Pande, Vijay S.
    Snyder, Michael P.
    Greenleaf, William J.
    [J]. NATURE GENETICS, 2016, 48 (02) : 117 - 125
  • [4] UniProt: a hub for protein information
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Apweiler, Rolf
    Alpi, Emanuele
    Antunes, Ricardo
    Arganiska, Joanna
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Gane, Paul
    Cas-tro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightin-gale, Andrew
    Orchard, Sandra
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    Zellner, Hermann
    Cowley, Andrew
    Figueira, Luis
    Li, Weizhong
    McWilliam, Hamish
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D204 - D212
  • [5] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] Assessing the evolutionary impact of amino acid mutations in the human genome
    Boyko, Adam R.
    Williamson, Scott H.
    Indap, Amit R.
    Degenhardt, Jeremiah D.
    Hernandez, Ryan D.
    Lohmueller, Kirk E.
    Adams, Mark D.
    Schmidt, Steffen
    Sninsky, John J.
    Sunyaev, Shamil R.
    White, Thomas J.
    Nielsen, Rasmus
    Clark, Andrew G.
    Bustamante, Carlos D.
    [J]. PLOS GENETICS, 2008, 4 (05):
  • [8] Natural selection on protein-coding genes in the human genome
    Bustamante, CD
    Fledel-Alon, A
    Williamson, S
    Nielsen, R
    Hubisz, MT
    Glanowski, S
    Tanenbaum, DM
    White, TJ
    Sninsky, JJ
    Hernandez, RD
    Civello, D
    Adams, MD
    Cargill, M
    Clark, AG
    [J]. NATURE, 2005, 437 (7062) : 1153 - 1157
  • [9] Predicting functionally important residues from sequence conservation
    Capra, John A.
    Singh, Mona
    [J]. BIOINFORMATICS, 2007, 23 (15) : 1875 - 1882
  • [10] ProteinHistorian: Tools for the Comparative Analysis of Eukaryote Protein Origin
    Capra, John A.
    Williams, Alexander G.
    Pollard, Katherine S.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (06)