Sequence count data are poorly fit by the negative binomial distribution

被引:25
|
作者
Hawinkel, Stijn [1 ]
Rayner, J. C. W. [2 ,5 ]
Bijnens, Luc [3 ,4 ]
Thas, Olivier [1 ,4 ,5 ]
机构
[1] Univ Ghent, Dept Data Anal & Math Modelling, Ghent, Belgium
[2] Univ Newcastle, Ctr Comp Assisted Res Math & Its Applicat, Sch Math & Phys Sci, Newcastle, NSW, Australia
[3] Janssen Pharmaceut Co Johnson & Johnson, Quantitat Sci, Ghent, Belgium
[4] Hasselt Univ, I BioStat, Hasselt, Belgium
[5] Univ Wollongong, Natl Inst Appl Stat Res Australia NIASRA, Wollongong, NSW, Australia
来源
PLOS ONE | 2020年 / 15卷 / 04期
关键词
GOODNESS-OF-FIT; RNA-SEQ DATA; MODELS;
D O I
10.1371/journal.pone.0224909
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sequence count data are commonly modelled using the negative binomial (NB) distribution. Several empirical studies, however, have demonstrated that methods based on the NB-assumption do not always succeed in controlling the false discovery rate (FDR) at its nominal level. In this paper, we propose a dedicated statistical goodness of fit test for the NB distribution in regression models and demonstrate that the NB-assumption is violated in many publicly available RNA-Seq and 16S rRNA microbiome datasets. The zero-inflated NB distribution was not found to give a substantially better fit. We also show that the NB-based tests perform worse on the features for which the NB-assumption was violated than on the features for which no significant deviation was detected. This gives an explanation for the poor behaviour of NB-based tests in many published evaluation studies. We conclude that non-parametric tests should be preferred over parametric methods.
引用
收藏
页数:16
相关论文
共 48 条
  • [1] Goodness-of-Fit Test for the Bivariate Negative Binomial Distribution
    Novoa-Munoz, Francisco
    Aguirre-Gonzalez, Juan Pablo
    AXIOMS, 2025, 14 (01)
  • [2] Negative Binomial-Reciprocal Inverse Gaussian Distribution: Statistical Properties with Applications in Count Data
    Hassan, Anwar
    Shah, Ishfaq
    Peer, Bilal
    THAILAND STATISTICIAN, 2021, 19 (03): : 437 - 449
  • [3] Inference about the ratio of means from Negative Binomial paired count data
    Cadigan, N. G.
    Bataineh, O. M.
    ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2012, 19 (02) : 269 - 293
  • [4] A Bivariate Generalization of the Noncentral Negative Binomial Distribution
    Ong, S. H.
    Ng, C. M.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2013, 42 (03) : 570 - 585
  • [5] Fitting the truncated negative binomial distribution to count data A comparison of estimators, with an application to groundfishes from the Mauritanian Exclusive Economic Zone
    Mante, Claude
    Kide, Saikou Oumar
    Yao-Lafourcade, Anne-Francoise
    Merigot, Bastien
    ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2016, 23 (03) : 359 - 385
  • [6] Sample size calculation for clinical trials with correlated count measurements based on the negative binomial distribution
    Li, Dateng
    Zhang, Song
    Cao, Jing
    STATISTICS IN MEDICINE, 2019, 38 (28) : 5413 - 5427
  • [7] Novel goodness-of-fit tests for binomial count time series
    Aleksandrov, Boris
    Weiss, Christian H.
    Jentsch, Carsten
    Faymonville, Maxime
    STATISTICS, 2022, 56 (05) : 957 - 990
  • [8] Test of misspecification with application to negative binomial distribution
    Chua, K. C.
    Ong, S. H.
    COMPUTATIONAL STATISTICS, 2013, 28 (03) : 993 - 1009
  • [9] One mixed negative binomial distribution with application
    Wang, Zhaoliang
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (03) : 1153 - 1160
  • [10] The non-central negative binomial distribution: Further properties and applications
    Ong, Seng-Huat
    Toh, Kian-Kok
    Low, Yeh-Ching
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (02) : 329 - 344