MULTISCALE POISSON PROCESS APPROACHES FOR DETECTING AND ESTIMATING DIFFERENCES FROM HIGH-THROUGHPUT SEQUENCING ASSAYS

被引:0
|
作者
Shim, Heejung [1 ]
Xing, Zhengrong [2 ]
Pantaleo, Ester [2 ]
Luca, Francesca [3 ,4 ]
Pique-Regi, Roger [4 ,5 ]
Stephens, Matthew [6 ]
机构
[1] Univ Melbourne, Sch Math & Stat & Melbourne Integrat Genom, Melbourne, Australia
[2] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
[3] Wayne State Univ, Dept Obstet & Gynecol, Detroit, MI USA
[4] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI USA
[5] Wayne State Univ, Ctr Mol Med & Genet, Detroit, MI USA
[6] Univ Chicago, Dept Stat, Chicago, IL 60637 USA
关键词
Multiscale Poisson processes; wavelets; differential expression analysis; high- throughput sequencing assays; high-resolution; Bayesian inference; functional data; count data; RNA-seq; DNase-; seq; ATAC-seq; chromatin accessibility; RNA-SEQ; EXPRESSION ANALYSIS; OPEN CHROMATIN; IN-VIVO; ASSOCIATION;
D O I
10.1214/23-AOAS1828
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Estimating and testing for differences in molecular phenotypes (e.g., gene expression, chromatin accessibility, transcription factor binding) across conditions is an important part of understanding the molecular basis of gene regulation. These phenotypes are commonly measured using high-throughput high-resolution count data that reflect how the phenotypes vary along the genome. Multiple methods have been proposed to help exploit these highresolution measurements for differential expression analysis. However, they ignore the count nature of the data, instead using normal distributions that work well only for data with large sample sizes or high counts. Here we develop count-based methods to address this problem. We model the data for each sample using an inhomogeneous Poisson process with spatially structured underlying intensity function and then, building on multiscale models for the Poisson process, estimate and test for differences in the underlying intensity function across samples (or groups of samples). Using both simulation and real ATAC-seq data, we show that our method outperforms previous normal-based methods, especially in situations with small sample sizes or low counts.
引用
收藏
页码:1773 / 1788
页数:16
相关论文
共 50 条
  • [31] High-throughput identification of novel conotoxins from the Chinese tubular cone snail (Conus betulinus) by multi-transcriptome sequencing
    Peng, Chao
    Yao, Ge
    Gao, Bing-Miao
    Fan, Chong-Xu
    Bian, Chao
    Wang, Jintu
    Cao, Ying
    Wen, Bo
    Zhu, Yabing
    Ruan, Zhiqiang
    Zhao, Xiaofei
    You, Xinxin
    Bai, Jie
    Li, Jia
    Lin, Zhilong
    Zou, Shijie
    Zhang, Xinhui
    Qiu, Ying
    Chen, Jieming
    Coon, Steven L.
    Yang, Jiaan
    Chen, Ji-Sheng
    Shi, Qiong
    GIGASCIENCE, 2016, 5
  • [32] Application of high-throughput next-generation sequencing for HLA typing of DNA extracted from postprocessing cord blood units
    Seshasubramanian, Vani
    Venugopal, Meganathan
    Kannan, Aruna D. S.
    Naganathan, Chandramouleeswaran
    Manisekar, Nirmal K.
    Kumar, Yogesh N.
    Narayan, Saranya
    Periathiruvadi, Srinivasan
    HLA, 2019, 94 (02) : 141 - 146
  • [33] eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing
    Yuan, Tiezheng
    Huang, Xiaoyi
    Dittmar, Rachel L.
    Du, Meijun
    Kohli, Manish
    Boardman, Lisa
    Thibodeau, Stephen N.
    Wang, Liang
    BMC GENOMICS, 2014, 15
  • [34] Discovery and molecular characterization of a new cryptovirus dsRNA genome from Japanese persimmon through conventional cloning and high-throughput sequencing
    M. Morelli
    M. Chiumenti
    A. De Stradis
    P. La Notte
    A. Minafra
    Virus Genes, 2015, 50 : 160 - 164
  • [35] Reconstruction of small subunit ribosomal RNA from high-throughput sequencing data: A comparative study of metagenomics and total RNA sequencing
    Hempel, Christopher A.
    Carson, Shea E. E.
    Elliott, Tyler A.
    Adamowicz, Sarah J.
    Steinke, Dirk
    METHODS IN ECOLOGY AND EVOLUTION, 2023, 14 (08): : 2049 - 2064
  • [36] Identifying T Cell Receptors from High-Throughput Sequencing: Dealing with Promiscuity in TCRα and TCRβ Pairing
    Lee, Edward S.
    Thomas, Paul G.
    Mold, Jeff E.
    Yates, Andrew J.
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (01)
  • [37] High-throughput sequencing for diagnosing platelet disorders: lessons learned from exploring the causes of bleeding disorders
    Heremans, J.
    Freson, K.
    INTERNATIONAL JOURNAL OF LABORATORY HEMATOLOGY, 2018, 40 : 89 - 96
  • [38] Spectrum of mutations in monogenic diabetes genes identified from high-throughput DNA sequencing of 6888 individuals
    Bansal, Vikas
    Gassenhuber, Johann
    Phillips, Tierney
    Oliveira, Glenn
    Harbaugh, Rebecca
    Villarasa, Nikki
    Topol, Eric J.
    Seufferlein, Thomas
    Boehm, Bernhard O.
    BMC MEDICINE, 2017, 15
  • [39] High-throughput sequencing reveals differences in microbial community structure and diversity in the conjunctival tissue of healthy and type 2 diabetic mice
    Li, Fengjiao
    Yang, Shuo
    Ma, Ji
    Zhao, Xiaowen
    Chen, Meng
    Wang, Ye
    BMC MICROBIOLOGY, 2024, 24 (01)
  • [40] Direct mutation analysis by high-throughput sequencing: From germline to low-abundant, somatic variants
    Gundry, Michael
    Vijg, Jan
    MUTATION RESEARCH-FUNDAMENTAL AND MOLECULAR MECHANISMS OF MUTAGENESIS, 2012, 729 (1-2) : 1 - 15