Extended likelihood approach to large-scale multiple testing

被引:16
|
作者
Lee, Youngjo [1 ]
Bjornstad, Jan F. [2 ]
机构
[1] Seoul Natl Univ, Seoul 151, South Korea
[2] Stat Norway, N-0033 Oslo, Norway
基金
新加坡国家研究基金会;
关键词
Extended likelihood; False discovery rate; Likelihood; Likelihood ratio test; Maximum likelihood; Multiple testing; FALSE DISCOVERY RATE; EMPIRICAL BAYES; MICROARRAYS;
D O I
10.1111/rssb.12005
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
To date, only frequentist, Bayesian and empirical Bayes approaches have been studied for the large-scale inference problem of testing simultaneously hundreds or thousands of hypotheses. Their derivations start with some summarizing statistics without modelling the basic responses. As a consequence testing procedures have been developed without necessarily checking model assumptions, and empirical null distributions are needed to avoid the problem of rejecting all null hypotheses when the sample sizes are large. Nevertheless these procedures may not be statistically efficient. We present the multiple-testing problem as a multiple-prediction problem of whether a null hypothesis is true or not. We introduce hierarchical random-effect models for basic responses and show how the extended likelihood is built. It is shown that the likelihood prediction has a certain oracle property. The extended likelihood leads to new testing procedures, which are optimal for the usual loss function in hypothesis testing. The new tests are based on certain shrinkage t-statistics and control the local probability of false discovery for individual tests to maintain the global frequentist false discovery rate and have no need to consider an empirical null distribution for the shrinkage t-statistics. Conditions are given when these false rates vanish. Three examples illustrate how to use the likelihood method in practice. A numerical study shows that the likelihood approach can greatly improve existing methods and finding the best fitting model is crucial for the behaviour of test procedures.
引用
收藏
页码:553 / 575
页数:23
相关论文
共 50 条
  • [41] UGM: a more stable procedure for large-scale multiple testing problems, new solutions to identify oncogene
    Liu, Chengyou
    Zhou, Leilei
    Wang, Yuhe
    Tian, Shuchang
    Zhu, Junlin
    Qin, Hang
    Ding, Yong
    Jiang, Hongbing
    THEORETICAL BIOLOGY AND MEDICAL MODELLING, 2019, 16 (01)
  • [42] Large-Scale Simultaneous Testing Using Kernel Density Estimation
    Santu Ghosh
    Alan M. Polansky
    Sankhya A, 2022, 84 (2): : 808 - 843
  • [43] The optimal discovery procedure in multiple significance testing: an empirical Bayes approach
    Noma, Hisashi
    Matsui, Shigeyuki
    STATISTICS IN MEDICINE, 2012, 31 (02) : 165 - 176
  • [44] Large-Scale Simultaneous Testing Using Kernel Density Estimation
    Ghosh, Santu
    Polansky, Alan M.
    SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY, 2022, 84 (02): : 808 - 843
  • [45] Large-scale dependent multiple testing via hidden semi-Markov models
    Jiangzhou Wang
    Pengfei Wang
    Computational Statistics, 2024, 39 : 1093 - 1126
  • [46] A new approach to multiple testing of grouped hypotheses
    Liu, Yanping
    Sarkar, Sanat K.
    Zhao, Zhigen
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2016, 179 : 1 - 14
  • [47] On the empirical Bayes approach to the problem of multiple testing
    Bogdan, Malgorzata
    Ghosh, Jayanta K.
    Ochman, Aleksandra
    Tokdar, Surya T.
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2007, 23 (06) : 727 - 739
  • [48] MixTwice: large-scale hypothesis testing for peptide arrays by variance mixing
    Zheng, Zihao
    Mergaert, Aisha M.
    Ong, Irene M.
    Shelef, Miriam A.
    Newton, Michael A.
    BIOINFORMATICS, 2021, 37 (17) : 2637 - 2643
  • [49] An adaptive approach for online monitoring of large-scale data streams
    Cao, Shuchen
    Zhang, Ruizhi
    IISE TRANSACTIONS, 2025, 57 (02) : 119 - 130
  • [50] A nonparametric empirical Bayes approach to large-scale multivariate regression
    Wang, Yihe
    Zhao, Sihai Dave
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 156