The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

被引:100
作者
Petrovski, Slave [1 ,2 ,3 ]
Gussow, Ayal B. [1 ,2 ,4 ]
Wang, Quanli [1 ,2 ]
Halvorsen, Matt [1 ,2 ]
Han, Yujun [2 ]
Weir, William H. [2 ]
Allen, Andrew S. [2 ,5 ]
Goldstein, David B. [1 ,2 ]
机构
[1] Columbia Univ, Inst Genom Med, New York, NY 10027 USA
[2] Duke Univ, Sch Med, Ctr Human Genome Variat, Durham, NC USA
[3] Univ Melbourne, Dept Med, Austin Hlth & Royal Melbourne Hosp, Melbourne, Vic, Australia
[4] Duke Univ, Program Computat Biol & Bioinformat, Durham, NC USA
[5] Duke Univ, Dept Biostat & Bioinformat, Durham, NC USA
来源
PLOS GENETICS | 2015年 / 11卷 / 09期
基金
美国国家卫生研究院;
关键词
DE-NOVO MUTATIONS; FRAMEWORK; SCHIZOPHRENIA; ANNOTATION; DISCOVERY; VARIANTS; PATTERNS; DATABASE; DISEASE; HUMANS;
D O I
10.1371/journal.pgen.1005492
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] De novo mutations in epileptic encephalopathies
    Allen, Andrew S.
    Berkovic, Samuel F.
    Cossette, Patrick
    Delanty, Norman
    Dlugos, Dennis
    Eichler, Evan E.
    Epstein, Michael P.
    Glauser, Tracy
    Goldstein, David B.
    Han, Yujun
    Heinzen, Erin L.
    Hitomi, Yuki
    Howell, Katherine B.
    Johnson, Michael R.
    Kuzniecky, Ruben
    Lowenstein, Daniel H.
    Lu, Yi-Fan
    Madou, Maura R. Z.
    Marson, Anthony G.
    Mefford, Heather C.
    Nieh, Sahar Esmaeeli
    O'Brien, Terence J.
    Ottman, Ruth
    Petrovski, Slave
    Poduri, Annapurna
    Ruzzo, Elizabeth K.
    Scheffer, Ingrid E.
    Sherr, Elliott H.
    Yuskaitis, Christopher J.
    Abou-Khalil, Bassel
    Alldredge, Brian K.
    Bautista, Jocelyn F.
    Berkovic, Samuel F.
    Boro, Alex
    Cascino, Gregory D.
    Consalvo, Damian
    Crumrine, Patricia
    Devinsky, Orrin
    Dlugos, Dennis
    Epstein, Michael P.
    Fiol, Miguel
    Fountain, Nathan B.
    French, Jacqueline
    Friedman, Daniel
    Geller, Eric B.
    Glauser, Tracy
    Glynn, Simon
    Haut, Sheryl R.
    Hayward, Jean
    Helmers, Sandra L.
    [J]. NATURE, 2013, 501 (7466) : 217 - +
  • [2] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [3] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [4] Disruptive CHD8 Mutations Define a Subtype of Autism Early in Development
    Bernier, Raphael
    Golzio, Christelle
    Xiong, Bo
    Stessman, Holly A.
    Coe, Bradley P.
    Penn, Osnat
    Witherspoon, Kali
    Gerdts, Jennifer
    Baker, Carl
    Vulto-van Silfhout, Anneke T.
    Schuurs-Hoeijmakers, Janneke H.
    Fichera, Marco
    Bosco, Paolo
    Buono, Serafino
    Alberti, Antonino
    Failla, Pinella
    Peeters, Hilde
    Steyaert, Jean
    Vissers, Lisenka E. L. M.
    Francescatto, Ludmila
    Mefford, Heather C.
    Rosenfeld, Jill A.
    Bakken, Trygve
    O'Roak, Brian J.
    Pawlus, Matthew
    Moon, Randall
    Shendure, Jay
    Amaral, David G.
    Lein, Ed
    Rankin, Julia
    Romano, Corrado
    de Vries, Bert B. A.
    Katsanis, Nicholas
    Eichler, Evan E.
    [J]. CELL, 2014, 158 (02) : 263 - 276
  • [5] Annotation of functional variation in personal genomes using RegulomeDB
    Boyle, Alan P.
    Hong, Eurie L.
    Hariharan, Manoj
    Cheng, Yong
    Schaub, Marc A.
    Kasowski, Maya
    Karczewski, Konrad J.
    Park, Julie
    Hitz, Benjamin C.
    Weng, Shuai
    Cherry, J. Michael
    Snyder, Michael
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1790 - 1797
  • [6] Targeted resequencing in epileptic encephalopathies identifies de novo mutations in CHD2 and SYNGAP1
    Carvill, Gemma L.
    Heavin, Sinead B.
    Yendle, Simone C.
    McMahon, Jacinta M.
    O'Roak, Brian J.
    Cook, Joseph
    Khan, Adiba
    Dorschner, Michael O.
    Weaver, Molly
    Calvert, Sophie
    Malone, Stephen
    Wallace, Geoffrey
    Stanley, Thorsten
    Bye, Ann M. E.
    Bleasel, Andrew
    Howell, Katherine B.
    Kivity, Sara
    Mackay, Mark T.
    Rodriguez-Casero, Victoria
    Webster, Richard
    Korczyn, Amos
    Afawi, Zaid
    Zelnick, Nathanel
    Lerman-Sagie, Tally
    Lev, Dorit
    Moller, Rikke S.
    Gill, Deepak
    Andrade, Danielle M.
    Freeman, Jeremy L.
    Sadleir, Lynette G.
    Shendure, Jay
    Berkovic, Samuel F.
    Scheffer, Ingrid E.
    Mefford, Heather C.
    [J]. NATURE GENETICS, 2013, 45 (07) : 825 - U158
  • [7] Role of 5′- and 3′-untranslated regions of mRNAs in human diseases
    Chatterjee, Sangeeta
    Pal, Jayanta K.
    [J]. BIOLOGY OF THE CELL, 2009, 101 (05) : 251 - 262
  • [8] A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3
    Cingolani, Pablo
    Platts, Adrian
    Wang, Le Lily
    Coon, Melissa
    Tung Nguyen
    Wang, Luan
    Land, Susan J.
    Lu, Xiangyi
    Ruden, Douglas M.
    [J]. FLY, 2012, 6 (02) : 80 - 92
  • [9] Cirulli E.T., 2015, Science
  • [10] Origins and functional impact of copy number variation in the human genome
    Conrad, Donald F.
    Pinto, Dalila
    Redon, Richard
    Feuk, Lars
    Gokcumen, Omer
    Zhang, Yujun
    Aerts, Jan
    Andrews, T. Daniel
    Barnes, Chris
    Campbell, Peter
    Fitzgerald, Tomas
    Hu, Min
    Ihm, Chun Hwa
    Kristiansson, Kati
    MacArthur, Daniel G.
    MacDonald, Jeffrey R.
    Onyiah, Ifejinelo
    Pang, Andy Wing Chun
    Robson, Sam
    Stirrups, Kathy
    Valsesia, Armand
    Walter, Klaudia
    Wei, John
    Tyler-Smith, Chris
    Carter, Nigel P.
    Lee, Charles
    Scherer, Stephen W.
    Hurles, Matthew E.
    [J]. NATURE, 2010, 464 (7289) : 704 - 712