OutPyR: Bayesian inference for RNA-Seq outlier detection

被引:7
作者
Salkovic, Edin [1 ]
Abbas, Mostafa M. [2 ]
Belhaouari, Samir Brahim [1 ]
Errafii, Khaoula [3 ,4 ]
Bensmail, Halima [1 ,2 ]
机构
[1] Hamad Bin Khalifa Univ, Coll Sci Engn, POB 34110, Doha, Qatar
[2] Hamad Bin Khalifa Univ, Qatar Comp Res Inst, POB 34110, Doha, Qatar
[3] Hamad Bin Khalifa Univ, Qatar Biomed Res Inst, POB 34110, Doha, Qatar
[4] Hamad Bin Khalifa Univ, Coll Hlth & Life Sci, POB 34110, Doha, Qatar
关键词
RNA-Seq; Outlier detection; Bayesian modeling; GENE-EXPRESSION; COUNT;
D O I
10.1016/j.jocs.2020.101245
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
High-throughput RNA sequencing technologies (RNA-Seq) have recently started being used as a tool for helping diagnose rare genetic disorders, as they can indicate abnormal gene expression counts - a telltale sign of genetic pathology. Existing solutions either require a large number of samples or do not provide proper statistical significance testing. We present a Bayesian model (OutPyR) for identifying abnormal RNA-Seq gene expression counts in datasets, particularly those with a small number of samples. The model incorporates recently introduced data-augmentation techniques to efficiently and accurately infer parameters of the underlying negative binomial process, while also assessing the uncertainty of the inference, and giving the possibility to generate simulated data. The model's software implementation is object oriented and thus easily extensible, provides parameter-trace exploration, fault-tolerance and recovery during the parameter estimation process. We also develop a p-value based outlier score that naturally stems from our model. We apply the model to real and simulated datasets, for different organisms and tissues, and present comparisons with existing models. Our model is implemented purely in Python and its standalone source code is available at https: //github.com/esalkovic/outpyr.
引用
收藏
页数:9
相关论文
共 21 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]  
[Anonymous], 2006, GUIDE NUMPY
[3]  
[Anonymous], 2010, GENOME BIOL, DOI DOI 10.1186/gb-2010-11-10-r106
[4]  
Asparouhov T., 2017, Mplus Web Notes
[5]   FITTING THE NEGATIVE BINOMIAL DISTRIBUTION TO BIOLOGICAL DATA - NOTE ON THE EFFICIENT FITTING OF THE NEGATIVE BINOMIAL [J].
BLISS, CI ;
FISHER, RA .
BIOMETRICS, 1953, 9 (02) :176-200
[6]   OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data [J].
Brechtmann, Felix ;
Mertes, Christian ;
Matuseviciute, Agne ;
Yepez, Vicente A. ;
Avsec, Ziga ;
Herzog, Maximilian ;
Bader, Daniel M. ;
Prokisch, Holger ;
Gagneur, Julien .
AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 103 (06) :907-917
[7]   Improving genetic diagnosis in Mendelian disease with transcriptome sequencing [J].
Cummings, Beryl B. ;
Marshall, Jamie L. ;
Tukiainen, Taru ;
Lek, Monkol ;
Donkervoort, Sandra ;
Foley, A. Reghan ;
Bolduc, Veronique ;
Waddell, Leigh B. ;
Sandaradura, Sarah A. ;
O'Grady, Gina L. ;
Estrella, Elicia ;
Reddy, Hemakumar M. ;
Zhao, Fengmei ;
Weisburd, Ben ;
Karczewski, Konrad J. ;
O'Donnell-Luria, Anne H. ;
Birnbaum, Daniel ;
Sarkozy, Anna ;
Hu, Ying ;
Gonorazky, Hernan ;
Claeys, Kristl ;
Joshi, Himanshu ;
Bournazos, Adam ;
Oates, Emily C. ;
Ghaoui, Roula ;
Davis, Mark R. ;
Laing, Nigel G. ;
Topf, Ana ;
Kang, Peter B. ;
Beggs, Alan H. ;
North, Kathryn N. ;
Straub, Volker ;
Dowling, James J. ;
Muntoni, Francesco ;
Clarke, Nigel F. ;
Cooper, Sandra T. ;
Bonnemann, Carsten G. ;
MacArthur, Daniel G. .
SCIENCE TRANSLATIONAL MEDICINE, 2017, 9 (386)
[8]   BNP-Seq: Bayesian Nonparametric Differential Expression Analysis of Sequencing Count Data [J].
Dadaneh, Siamak Zamani ;
Qian, Xiaoning ;
Zhou, Mingyuan .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (521) :81-94
[9]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210
[10]   Genetic diagnosis of Mendelian disorders via RNA sequencing [J].
Kremer, Laura S. ;
Bader, Daniel M. ;
Mertes, Christian ;
Kopajtich, Robert ;
Pichler, Garwin ;
Iuso, Arcangela ;
Haack, Tobias B. ;
Graf, Elisabeth ;
Schwarzmayr, Thomas ;
Terrile, Caterina ;
Konarikova, Eliska ;
Repp, Birgit ;
Kastenmueller, Gabi ;
Adamski, Jerzy ;
Lichtner, Peter ;
Leonhardt, Christoph ;
Funalot, Benoit ;
Donati, Alice ;
Tiranti, Valeria ;
Lombes, Anne ;
Jardel, Claude ;
Glaeser, Dieter ;
Taylor, Robert W. ;
Ghezzi, Daniele ;
Mayr, Johannes A. ;
Roetig, Agnes ;
Freisinger, Peter ;
Distelmaier, Felix ;
Strom, Tim M. ;
Meitinger, Thomas ;
Gagneur, Julien ;
Prokisch, Holger .
NATURE COMMUNICATIONS, 2017, 8