Prior robust empirical Bayes inference for large-scale data by conditioning on rank with application to microarray data

被引：3

作者：

Liao, J. G. ^{[1
]}

McMurry, Timothy ^{[2
]}

Berg, Arthur ^{[1
]}

机构：

[1] Penn State Univ, Div Biostat & Bioinformat, Hershey, PA 17033 USA

[2] Univ Virginia, Div Biostat, Charlottesville, VA 22908 USA

来源：

BIOSTATISTICS | 2014年 / 15卷 / 01期

关键词：

Bayesian shrinkage; Confidence intervals; Ranking bias; Robust multiple estimation; MULTIPLE CONFIDENCE-INTERVALS; GENE-EXPRESSION; SELECTION; MODEL;

D O I：

10.1093/biostatistics/kxt026

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Empirical Bayes methods have been extensively used for microarray data analysis by modeling the large number of unknown parameters as random effects. Empirical Bayes allows borrowing information across genes and can automatically adjust for multiple testing and selection bias. However, the standard empirical Bayes model can perform poorly if the assumed working prior deviates from the true prior. This paper proposes a new rank-conditioned inference in which the shrinkage and confidence intervals are based on the distribution of the error conditioned on rank of the data. Our approach is in contrast to a Bayesian posterior, which conditions on the data themselves. The new method is almost as efficient as standard Bayesian methods when the working prior is close to the true prior, and it is much more robust when the working prior is not close. In addition, it allows a more accurate (but also more complex) non-parametric estimate of the prior to be easily incorporated, resulting in improved inference. The new method's prior robustness is demonstrated via simulation experiments. Application to a breast cancer gene expression microarray dataset is presented. Our R package rank. Shrinkage provides a ready-to-use implementation of the proposed methodology.

引用

页码：60 / 73

页数：14

共 50 条

[21] Data-driven Authoring of Large-scale Ecosystems
Kapp, Konrad
Gain, James
Guerin, Eric
Galin, Eric
Peytavie, Adrien
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06):
[22] Optimizing data stream processing for large-scale applications
Cappellari, Paolo
Roantree, Mark
Chun, Soon Ae
SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (09) : 1607 - 1641
[23] Large-Scale Data Analysis Using Heuristic Methods
Dzemyda, Gintautas
Sakalauskas, Leonidas
INFORMATICA, 2011, 22 (01) : 1 - 10
[24] Particle network EnKF for large-scale data assimilation
Li, Xinjia
Lu, Wenlian
FRONTIERS IN PHYSICS, 2022, 10
[25] Large-Scale Generation and Validation of Synthetic PMU Data
Idehen, Ikponmwosa
Jang, Wonhyeok
Overbye, Thomas J.
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (05) : 4290 - 4298
[26] A Quick Guide to Large-Scale Genomic Data Mining
Huttenhower, Curtis
Hofmann, Oliver
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (05) : 1 - 6
[27] Large-scale labeling and assessment of sex bias in publicly available expression data
Flynn, Emily
Chang, Annie
Altman, Russ B.
BMC BIOINFORMATICS, 2021, 22 (01)
[28] Robust estimation and empirical likelihood inference with exponential squared loss for panel data models
Li, Shaomin
Wang, Kangning
Ren, Yanyan
ECONOMICS LETTERS, 2018, 164 : 19 - 23
[29] A dynamic, interpretable, and robust hybrid data analytics system for train movements in large-scale railway networks
Oneto, Luca
Buselli, Irene
Lulli, Alessandro
Canepa, Renzo
Petralli, Simone
Anguita, Davide
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2020, 9 (01) : 95 - 111
[30] Sparse cluster analysis of large-scale discrete variables with application to single nucleotide polymorphism data
Wu, Baolin
JOURNAL OF APPLIED STATISTICS, 2013, 40 (02) : 358 - 367

← 1 2 3 4 5 →