Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data

被引:3
|
作者
Liao, J. G. [1 ]
McMurry, Timothy [2 ]
Berg, Arthur [1 ]
机构
[1] Penn State Univ, Div Biostat & Bioinformat, Hershey, PA 17033 USA
[2] Univ Virginia, Div Biostat, Charlottesville, VA 22908 USA
关键词
Bayesian shrinkage; Confidence intervals; Ranking bias; Robust multiple estimation; MULTIPLE CONFIDENCE-INTERVALS; GENE-EXPRESSION; SELECTION; MODEL;
D O I
10.1093/biostatistics/kxt026
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method's prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank. Shrinkage provides a ready-to-use implementation of the proposed methodology.
引用
收藏
页码:60 / 73
页数:14
相关论文
共 50 条
  • [21] Data-driven Authoring of Large-scale Ecosystems
    Kapp, Konrad
    Gain, James
    Guerin, Eric
    Galin, Eric
    Peytavie, Adrien
    ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
  • [22] Optimizing data stream processing for large-scale applications
    Cappellari, Paolo
    Roantree, Mark
    Chun, Soon Ae
    SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09) : 1607 - 1641
  • [23] Large-Scale Data Analysis Using Heuristic Methods
    Dzemyda, Gintautas
    Sakalauskas, Leonidas
    INFORMATICA, 2011, 22 (01) : 1 - 10
  • [24] Particle network EnKF for large-scale data assimilation
    Li, Xinjia
    Lu, Wenlian
    FRONTIERS IN PHYSICS, 2022, 10
  • [25] Large-Scale Generation and Validation of Synthetic PMU Data
    Idehen, Ikponmwosa
    Jang, Wonhyeok
    Overbye, Thomas J.
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (05) : 4290 - 4298
  • [26] A Quick Guide to Large-Scale Genomic Data Mining
    Huttenhower, Curtis
    Hofmann, Oliver
    PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (05) : 1 - 6
  • [27] Large-scale labeling and assessment of sex bias in publicly available expression data
    Flynn, Emily
    Chang, Annie
    Altman, Russ B.
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [28] Robust estimation and empirical likelihood inference with exponential squared loss for panel data models
    Li, Shaomin
    Wang, Kangning
    Ren, Yanyan
    ECONOMICS LETTERS, 2018, 164 : 19 - 23
  • [29] A dynamic, interpretable, and robust hybrid data analytics system for train movements in large-scale railway networks
    Oneto, Luca
    Buselli, Irene
    Lulli, Alessandro
    Canepa, Renzo
    Petralli, Simone
    Anguita, Davide
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2020, 9 (01) : 95 - 111
  • [30] Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data
    Wu, Baolin
    JOURNAL OF APPLIED STATISTICS, 2013, 40 (02) : 358 - 367