NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans

被引:37
作者
Caron, Barthelemy [1 ]
Luo, Yufei [1 ]
Rausell, Antonio [1 ,2 ]
机构
[1] Paris Descartes Univ, Sorbonne Paris Cite, Imagine Inst, Clin Bioinformat Lab, F-75015 Paris, France
[2] INSERM, Inst Imagine, UMR 1163, F-75015 Paris, France
关键词
Mendelian diseases; Whole genome sequencing; Rare variant analysis; Non-coding genetic variants; Pathogenicity score; REGULATORY VARIANTS; NATURAL-SELECTION; DNA ELEMENTS; GENOME; GENES; IDENTIFICATION; MUTATIONS; FRAMEWORK; MODEL;
D O I
10.1186/s13059-019-1634-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.
引用
收藏
页数:22
相关论文
共 71 条
[1]   An expanded sequence context model broadly explains variability in polymorphism levels across the human genome [J].
Aggarwala, Varun ;
Voight, Benjamin F. .
NATURE GENETICS, 2016, 48 (04) :349-+
[2]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[3]  
[Anonymous], IDENTIFICATION ESSEN
[4]  
[Anonymous], 2015, Nature, DOI DOI 10.1038/NATURE15393
[5]  
[Anonymous], NCBOOST V1 0 0 PRECO
[6]  
[Anonymous], HEART ENHANCERS DEEP
[7]  
[Anonymous], ZENODO
[8]   FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications [J].
Backenroth, Daniel ;
He, Zihuai ;
Kiryluk, Krzysztof ;
Boeva, Valentina ;
Pethukova, Lynn ;
Khurana, Ekta ;
Christiano, Angela ;
Buxbaum, Joseph D. ;
Ionita-Laza, Iuliana .
AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 102 (05) :920-942
[9]   Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells [J].
Chen, Lu ;
Ge, Bing ;
Casale, Francesco Paolo ;
Vasquez, Louella ;
Kwan, Tony ;
Garrido-Martin, Diego ;
Watt, Stephen ;
Yan, Ying ;
Kundu, Kousik ;
Ecker, Simone ;
Datta, Avik ;
Richardson, David ;
Burden, Frances ;
Mead, Daniel ;
Mann, Alice L. ;
Maria Fernandez, Jose ;
Rowlston, Sophia ;
Wilder, Steven P. ;
Farrow, Samantha ;
Shao, Xiaojian ;
Lambourne, John J. ;
Redensek, Adriana ;
Albers, Cornelis A. ;
Amstislavskiy, Vyacheslav ;
Ashford, Sofie ;
Berentsen, Kim ;
Bomba, Lorenzo ;
Bourque, Guillaume ;
Bujold, David ;
Busche, Stephan ;
Caron, Maxime ;
Chen, Shu-Huang ;
Cheung, Warren ;
Delaneau, Oliver ;
Dermitzakis, Emmanouil T. ;
Elding, Heather ;
Colgiu, Irina ;
Bagger, Frederik O. ;
Flicek, Paul ;
Habibi, Ehsan ;
Iotchkova, Valentina ;
Janssen-Megens, Eva ;
Kim, Bowon ;
Lehrach, Hans ;
Lowy, Ernesto ;
Mandoli, Amit ;
Matarese, Filomena ;
Maurano, Matthew T. ;
Morris, John A. ;
Pancaldi, Vera .
CELL, 2016, 167 (05) :1398-+
[10]   OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines [J].
Chen, Wei-Hua ;
Lu, Guanting ;
Chen, Xiao ;
Zhao, Xing-Ming ;
Bork, Peer .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D940-D944