PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data

被引:11
作者
Zhang, Hong [1 ]
Xu, Jinfeng [2 ]
Jiang, Ning [1 ]
Hu, Xiaohua [1 ]
Luo, Zewei [1 ]
机构
[1] Fudan Univ, Sch Life Sci, Dept Biostat & Computat Biol, Shanghai 200433, Peoples R China
[2] NYU, Sch Med, Dept Populat Hlth, Div Biostat, New York, NY 10003 USA
基金
中国国家自然科学基金;
关键词
RNA-seq; differential expression analysis; Poisson lognormal model; matched samples; DIFFERENTIAL EXPRESSION ANALYSIS; SEQ; NORMALIZATION; GENES; SAGE;
D O I
10.1002/sim.6449
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-throughput RNA-sequencing (RNA-seq) technology provides an attractive platform for gene expression analysis. In many experimental settings, RNA-seq read counts are measured from matched samples or taken from the same subject under multiple treatment conditions. The induced correlation therefore should be evaluated and taken into account in deriving tests of differential expression. We proposed a novel method PLNseq', which uses a multivariate Poisson lognormal distribution to model matched read count data. The correlation is directly modeled through Gaussian random effects, and inferences are made by likelihood methods. A three-stage numerical algorithm is developed to estimate unknown parameters and conduct differential expression analysis. Results using simulated data demonstrate that our method performs reasonably well in terms of parameter estimation, DE analysis power, and robustness. PLNseq also has better control of FDRs than the benchmarks edgeR and DESeq2 in the situations where the correlation is different across the genes but can still be accurately estimated. Furthermore, direct evaluation of correlation through PLNseq enables us to develop a new and more powerful test for DE analysis. Application to a lung cancer study is provided to illustrate the practical utilities of our method. An R package implementing the method is also publicly available. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:1577 / 1589
页数:13
相关论文
共 39 条
  • [1] AITCHISON J, 1989, BIOMETRIKA, V76, P643
  • [2] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [3] HTSeq-a Python']Python framework to work with high-throughput sequencing data
    Anders, Simon
    Pyl, Paul Theodor
    Huber, Wolfgang
    [J]. BIOINFORMATICS, 2015, 31 (02) : 166 - 169
  • [4] Overdispersed logistic regression for SAGE: Modelling multiple groups and covariates
    Baggerly, KA
    Deng, L
    Morris, JS
    Aldaz, CM
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [5] A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling
    Bradford, James R.
    Hey, Yvonne
    Yates, Tim
    Li, Yaoyong
    Pepper, Stuart D.
    Miller, Crispin J.
    [J]. BMC GENOMICS, 2010, 11
  • [6] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [7] FITTING POISSON LOGNORMAL DISTRIBUTION TO SPECIES-ABUNDANCE DATA
    BULMER, MG
    [J]. BIOMETRICS, 1974, 30 (01) : 101 - 110
  • [8] Carlsons M. org, 2013, ORG HS EG DB GENOME
  • [9] Differential expression analysis for paired RNA-seq data
    Chung, Lisa M.
    Ferguson, John P.
    Zheng, Wei
    Qian, Feng
    Bruno, Vincent
    Montgomery, Ruth R.
    Zhao, Hongyu
    [J]. BMC BIOINFORMATICS, 2013, 14 : 110
  • [10] Stem cell transcriptome profiling via massive-scale mRNA sequencing
    Cloonan, Nicole
    Forrest, Alistair R. R.
    Kolle, Gabriel
    Gardiner, Brooke B. A.
    Faulkner, Geoffrey J.
    Brown, Mellissa K.
    Taylor, Darrin F.
    Steptoe, Anita L.
    Wani, Shivangi
    Bethel, Graeme
    Robertson, Alan J.
    Perkins, Andrew C.
    Bruce, Stephen J.
    Lee, Clarence C.
    Ranade, Swati S.
    Peckham, Heather E.
    Manning, Jonathan M.
    McKernan, Kevin J.
    Grimmond, Sean M.
    [J]. NATURE METHODS, 2008, 5 (07) : 613 - 619