Large-Scale Global and Simultaneous Inference: Estimation and Testing in Very High Dimensions

被引:13
作者
Cai, T. Tony [1 ]
Sun, Wenguang [2 ]
机构
[1] Univ Penn, Wharton Sch, Dept Stat, Philadelphia, PA 19104 USA
[2] Univ Southern Calif, Marshall Sch Business, Dept Data Sci & Operat, Los Angeles, CA 90089 USA
来源
ANNUAL REVIEW OF ECONOMICS, VOL 9 | 2017年 / 9卷
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
compound decision problem; dependence; detection boundary; false discovery rate; global inference; multiple testing; null distribution; signal detection; simultaneous inference; sparsity; FALSE DISCOVERY RATE; REJECTIVE MULTIPLE TEST; HIDDEN MARKOV-MODELS; CONFIDENCE-INTERVALS; NULL HYPOTHESES; EMPIRICAL BAYES; STATISTICAL SIGNIFICANCE; SELECTIVE INFERENCE; STEPUP PROCEDURES; HIGHER CRITICISM;
D O I
10.1146/annurev-economics-063016-104355
中图分类号
F [经济];
学科分类号
02 ;
摘要
Due to rapid technological advances, researchers are now able to collect and analyze ever larger data sets. Statistical inference for big data often requires solving thousands or even millions of parallel inference problems simultaneously. This poses significant challenges and calls for new principles, theories, and methodologies. This review provides a selective survey of some recently developed methods and results for large-scale statistical inference, including detection, estimation, and multiple testing. We begin with the global testing problem, where the goal is to detect the existence of sparse signals in a data set, and then move to the problem of estimating the proportion of nonnull effects. Finally, we focus on multiple testing with false discovery rate (FDR) control. The FDR provides a powerful and practical approach to large-scale multiple testing and has been successfully used in a wide range of applications. We discuss several effective data-driven procedures and also present efficient strategies to handle various grouping, hierarchical, and dependency structures in the data.
引用
收藏
页码:411 / 439
页数:29
相关论文
共 133 条
[1]   Adapting to unknown sparsity by controlling the false discovery rate [J].
Abramovich, Felix ;
Benjamini, Yoav ;
Donoho, David L. ;
Johnstone, Iain M. .
ANNALS OF STATISTICS, 2006, 34 (02) :584-653
[2]   Monitoring disruptions in financial markets [J].
Andreou, Elena ;
Ghysels, Eric .
JOURNAL OF ECONOMETRICS, 2006, 135 (1-2) :77-124
[3]  
[Anonymous], 1998, Mathematical Methods in Statistics
[4]  
[Anonymous], 2004, Statistical Applications in Genetics and Molecular Biology, DOI 10.2202/1544-6115.1042
[5]  
Bailey N, 2014, 4834 CESIFO GROUP
[6]   CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS [J].
Barber, Rina Foygel ;
Candes, Emmanuel J. .
ANNALS OF STATISTICS, 2015, 43 (05) :2055-2085
[7]   False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas [J].
Barras, Laurent ;
Scaillet, Olivier ;
Wermers, Russ .
JOURNAL OF FINANCE, 2010, 65 (01) :179-216
[8]  
Basu P, 2015, ARXIV150801605STATME
[9]   Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems [J].
Belloni, A. ;
Chernozhukov, V. ;
Kato, K. .
BIOMETRIKA, 2015, 102 (01) :77-94
[10]   Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain [J].
Belloni, A. ;
Chen, D. ;
Chernozhukov, V. ;
Hansen, C. .
ECONOMETRICA, 2012, 80 (06) :2369-2429