PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data

被引:11
作者
Zhang, Hong [1 ]
Xu, Jinfeng [2 ]
Jiang, Ning [1 ]
Hu, Xiaohua [1 ]
Luo, Zewei [1 ]
机构
[1] Fudan Univ, Sch Life Sci, Dept Biostat & Computat Biol, Shanghai 200433, Peoples R China
[2] NYU, Sch Med, Dept Populat Hlth, Div Biostat, New York, NY 10003 USA
基金
中国国家自然科学基金;
关键词
RNA-seq; differential expression analysis; Poisson lognormal model; matched samples; DIFFERENTIAL EXPRESSION ANALYSIS; SEQ; NORMALIZATION; GENES; SAGE;
D O I
10.1002/sim.6449
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-throughput RNA-sequencing (RNA-seq) technology provides an attractive platform for gene expression analysis. In many experimental settings, RNA-seq read counts are measured from matched samples or taken from the same subject under multiple treatment conditions. The induced correlation therefore should be evaluated and taken into account in deriving tests of differential expression. We proposed a novel method PLNseq', which uses a multivariate Poisson lognormal distribution to model matched read count data. The correlation is directly modeled through Gaussian random effects, and inferences are made by likelihood methods. A three-stage numerical algorithm is developed to estimate unknown parameters and conduct differential expression analysis. Results using simulated data demonstrate that our method performs reasonably well in terms of parameter estimation, DE analysis power, and robustness. PLNseq also has better control of FDRs than the benchmarks edgeR and DESeq2 in the situations where the correlation is different across the genes but can still be accurately estimated. Furthermore, direct evaluation of correlation through PLNseq enables us to develop a new and more powerful test for DE analysis. Application to a lung cancer study is provided to illustrate the practical utilities of our method. An R package implementing the method is also publicly available. Copyright (c) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:1577 / 1589
页数:13
相关论文
共 39 条
  • [21] RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
    Marioni, John C.
    Mason, Christopher E.
    Mane, Shrikant M.
    Stephens, Matthew
    Gilad, Yoav
    [J]. GENOME RESEARCH, 2008, 18 (09) : 1509 - 1517
  • [22] Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
    McCarthy, Davis J.
    Chen, Yunshun
    Smyth, Gordon K.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (10) : 4288 - 4297
  • [23] Mapping and quantifying mammalian transcriptomes by RNA-Seq
    Mortazavi, Ali
    Williams, Brian A.
    McCue, Kenneth
    Schaeffer, Lorian
    Wold, Barbara
    [J]. NATURE METHODS, 2008, 5 (07) : 621 - 628
  • [24] A SIMPLEX-METHOD FOR FUNCTION MINIMIZATION
    NELDER, JA
    MEAD, R
    [J]. COMPUTER JOURNAL, 1965, 7 (04) : 308 - 313
  • [25] UNBIASED ESTIMATION OF CERTAIN CORRELATION-COEFFICIENTS
    OLKIN, I
    PRATT, JW
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1958, 29 (01): : 201 - 211
  • [26] From RNA-seq reads to differential expression results
    Oshlack, Alicia
    Robinson, Mark D.
    Young, Matthew D.
    [J]. GENOME BIOLOGY, 2010, 11 (12):
  • [27] An accurate paired sample test for count data
    Pham, Thang V.
    Jimenez, Connie R.
    [J]. BIOINFORMATICS, 2012, 28 (18) : I596 - I602
  • [28] Small-sample estimation of negative binomial dispersion, with applications to SAGE data
    Robinson, Mark D.
    Smyth, Gordon K.
    [J]. BIOSTATISTICS, 2008, 9 (02) : 321 - 332
  • [29] Moderated statistical tests for assessing differences in tag abundance
    Robinson, Mark D.
    Smyth, Gordon K.
    [J]. BIOINFORMATICS, 2007, 23 (21) : 2881 - 2887
  • [30] A scaling normalization method for differential expression analysis of RNA-seq data
    Robinson, Mark D.
    Oshlack, Alicia
    [J]. GENOME BIOLOGY, 2010, 11 (03):