MULTISCALE POISSON PROCESS APPROACHES FOR DETECTING AND ESTIMATING DIFFERENCES FROM HIGH-THROUGHPUT SEQUENCING ASSAYS

被引:0
作者
Shim, Heejung [1 ]
Xing, Zhengrong [2 ]
Pantaleo, Ester [2 ]
Luca, Francesca [3 ,4 ]
Pique-Regi, Roger [4 ,5 ]
Stephens, Matthew [6 ]
机构
[1] Univ Melbourne, Sch Math & Stat & Melbourne Integrat Genom, Melbourne, Australia
[2] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
[3] Wayne State Univ, Dept Obstet & Gynecol, Detroit, MI USA
[4] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI USA
[5] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI USA
[6] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
关键词
Multiscale Poisson processes; wavelets; differential expression analysis; high- throughput sequencing assays; high-resolution; Bayesian inference; functional data; count data; RNA-seq; DNase-; seq; ATAC-seq; chromatin accessibility; RNA-SEQ; EXPRESSION ANALYSIS; OPEN CHROMATIN; IN-VIVO; ASSOCIATION;
D O I
10.1214/23-AOAS1828
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Estimating and testing for differences in molecular phenotypes (e.g., gene expression, chromatin accessibility, transcription factor binding) across conditions is an important part of understanding the molecular basis of gene regulation. These phenotypes are commonly measured using high-throughput high-resolution count data that reflect how the phenotypes vary along the genome. Multiple methods have been proposed to help exploit these highresolution measurements for differential expression analysis. However, they ignore the count nature of the data, instead using normal distributions that work well only for data with large sample sizes or high counts. Here we develop count-based methods to address this problem. We model the data for each sample using an inhomogeneous Poisson process with spatially structured underlying intensity function and then, building on multiscale models for the Poisson process, estimate and test for differences in the underlying intensity function across samples (or groups of samples). Using both simulation and real ATAC-seq data, we show that our method outperforms previous normal-based methods, especially in situations with small sample sizes or low counts.
引用
收藏
页码:1773 / 1788
页数:16
相关论文
共 37 条
[1]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[2]   High-resolution mapping and characterization of open chromatin across the genome [J].
Boyle, Alan P. ;
Davis, Sean ;
Shulha, Hennady P. ;
Meltzer, Paul ;
Margulies, Elliott H. ;
Weng, Zhiping ;
Furey, Terrence S. ;
Crawford, Gregory E. .
CELL, 2008, 132 (02) :311-322
[3]  
Buenrostro JD, 2013, NAT METHODS, V10, P1213, DOI [10.1038/NMETH.2688, 10.1038/nmeth.2688]
[4]   Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression [J].
Busby, Michele A. ;
Stewart, Chip ;
Miller, Chase A. ;
Grzeda, Krzysztof R. ;
Marth, Gabor T. .
BIOINFORMATICS, 2013, 29 (05) :656-657
[5]  
COIFMAN RR, 1995, WAVELETS STAT, P125, DOI [DOI 10.1007/978-1-4612-2544-7_9, 10.1007/978-1-4612-2544-7]
[6]   Flexible expressed region analysis for RNA-seq with derfinder [J].
Collado-Torres, Leonardo ;
Nellore, Abhinav ;
Frazee, Alyssa C. ;
Wilks, Christopher ;
Love, Michael I. ;
Langmead, Ben ;
Irizarry, Rafael A. ;
Leek, Jeffrey T. ;
Jaffe, Andrew E. .
NUCLEIC ACIDS RESEARCH, 2017, 45 (02) :e9
[7]   Wavelet-based statistical signal processing using hidden Markov models [J].
Crouse, MS ;
Nowak, RD ;
Baraniuk, RG .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (04) :886-902
[8]   DNase I sensitivity QTLs are a major determinant of human expression variation [J].
Degner, Jacob F. ;
Pai, Athma A. ;
Pique-Regi, Roger ;
Veyrieras, Jean-Baptiste ;
Gaffney, Daniel J. ;
Pickrell, Joseph K. ;
De Leon, Sherryl ;
Michelini, Katelyn ;
Lewellen, Noah ;
Crawford, Gregory E. ;
Stephens, Matthew ;
Gilad, Yoav ;
Pritchard, Jonathan K. .
NATURE, 2012, 482 (7385) :390-394
[9]   Adapting to unknown smoothness via wavelet shrinkage [J].
Donoho, DL ;
Johnstone, IM .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (432) :1200-1224
[10]   Differential expression analysis of RNA-seq data at single-base resolution [J].
Frazee, Alyssa C. ;
Sabunciyan, Sarven ;
Hansen, Kasper D. ;
Irizarry, Rafael A. ;
Leek, Jeffrey T. .
BIOSTATISTICS, 2014, 15 (03) :413-426