NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans

被引:37
作者
Caron, Barthelemy [1 ]
Luo, Yufei [1 ]
Rausell, Antonio [1 ,2 ]
机构
[1] Paris Descartes Univ, Sorbonne Paris Cite, Imagine Inst, Clin Bioinformat Lab, F-75015 Paris, France
[2] INSERM, Inst Imagine, UMR 1163, F-75015 Paris, France
关键词
Mendelian diseases; Whole genome sequencing; Rare variant analysis; Non-coding genetic variants; Pathogenicity score; REGULATORY VARIANTS; NATURAL-SELECTION; DNA ELEMENTS; GENOME; GENES; IDENTIFICATION; MUTATIONS; FRAMEWORK; MODEL;
D O I
10.1186/s13059-019-1634-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.
引用
收藏
页数:22
相关论文
共 71 条
[11]   The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities [J].
Chong, Jessica X. ;
Buckingham, Kati J. ;
Jhangiani, Shalini N. ;
Boehm, Corinne ;
Sobreira, Nara ;
Smith, Joshua D. ;
Harrell, Tanya M. ;
McMillin, Margaret J. ;
Wiszniewski, Wojciech ;
Gambin, Tomasz ;
Akdemir, Zeynep H. Coban ;
Doheny, Kimberly ;
Scott, Alan F. ;
Avramopoulos, Dimitri ;
Chakravarti, Aravinda ;
Hoover-Fong, Julie ;
Mathews, Debra ;
Witmer, P. Dane ;
Ling, Hua ;
Hetrick, Kurt ;
Watkins, Lee ;
Patterson, Karynne E. ;
Reinier, Frederic ;
Blue, Elizabeth ;
Muzny, Donna ;
Kircher, Martin ;
Bilguvar, Kaya ;
Lopez-Giraldez, Francesc ;
Sutton, V. Reid ;
Tabor, Holly K. ;
Lea, Suzanne M. ;
Gune, Murat ;
Mane, Shrikant ;
Gibbs, Richard A. ;
Boerwinkle, Eric ;
Hamosh, Ada ;
Shendure, Jay ;
Lupski, James R. ;
Lifton, Richard P. ;
Valle, David ;
Nickerson, Deborah A. ;
Bamshad, Michael J. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2015, 97 (02) :199-215
[12]   Identification of human haploinsufficient genes and their genomic proximity to segmental duplications [J].
Dang, Vinh T. ;
Kassahn, Karin S. ;
Marcos, Andres Esteban ;
Ragan, Mark A. .
EUROPEAN JOURNAL OF HUMAN GENETICS, 2008, 16 (11) :1350-1357
[13]   Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP plus [J].
Davydov, Eugene V. ;
Goode, David L. ;
Sirota, Marina ;
Cooper, Gregory M. ;
Sidow, Arend ;
Batzoglou, Serafim .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (12)
[14]   The human noncoding genome defined by genetic diversity [J].
di Iulio, Julia ;
Bartha, Istvan ;
Wong, Emily H. M. ;
Yu, Hung-Chun ;
Lavrenko, Victor ;
Yang, Dongchan ;
Jung, Inkyung ;
Hicks, Michael A. ;
Shah, Naisha ;
Kirkness, Ewen F. ;
Fabani, Martin M. ;
Biggs, William H. ;
Ren, Bing ;
Venter, J. Craig ;
Telenti, Amalio .
NATURE GENETICS, 2018, 50 (03) :333-+
[15]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[16]   The impact of recombination on nucleotide substitutions in the human genome [J].
Duret, Laurent ;
Arndt, Peter F. .
PLOS GENETICS, 2008, 4 (05)
[17]   Genetic and epigenetic fine mapping of causal autoimmune disease variants [J].
Farh, Kyle Kai-How ;
Marson, Alexander ;
Zhu, Jiang ;
Kleinewietfeld, Markus ;
Housley, William J. ;
Beik, Samantha ;
Shoresh, Noam ;
Whitton, Holly ;
Ryan, Russell J. H. ;
Shishkin, Alexander A. ;
Hatan, Meital ;
Carrasco-Alfonso, Marlene J. ;
Mayer, Dita ;
Luckey, C. John ;
Patsopoulos, Nikolaos A. ;
De Jager, Philip L. ;
Kuchroo, Vijay K. ;
Epstein, Charles B. ;
Daly, Mark J. ;
Hafler, David A. ;
Bernstein, Bradley E. .
NATURE, 2015, 518 (7539) :337-343
[18]  
Field M.J., 2010, Rare Diseases and Orphan Products: Accelerating Research and Development
[19]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[20]   Selection and Adaptation in the Human Genome [J].
Fu, Wenqing ;
Akey, Joshua M. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 14, 2013, 14 :467-489