To how many simultaneous hypothesis tests can normal, student's t or bootstrap calibration be applied?

被引:56
作者
Fan, Jianqing [1 ,3 ]
Hall, Peter [2 ,3 ]
Yao, Qiwei [3 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[2] Acad Math & Syst Sci, Ctr Stat Res, Beijing, Peoples R China
[3] Univ Melbourne, Dept Math Stat, Melbourne, Vic 3010, Australia
基金
美国国家科学基金会;
关键词
Bonferroni's inequality; edgeworth expansion; genetic data; large-deviation expansion; level accuracy; microarray data; quantile estimation; skewness; student's t statistic;
D O I
10.1198/016214507000000969
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In the analysis of microarray data, and in some other contemporary statistical problems, it is not uncommon to apply hypothesis tests in a highly simultaneous way. The number, N say, of tests used can be much larger than the sample sizes, n, to which the tests are applied, yet we wish to calibrate the tests so that the overall level of the simultaneous test is accurate. Often the sampling distribution is quite different for each test, so there may not be an opportunity to combine data across samples. In this setting, how large can N be, as a function of n, before level accuracy becomes poor? Here we answer this question in cases where the statistic under test is of Student's t type. We show that if either the normal or Student t distribution is used for calibration, then the level of the simultaneous test is accurate provided that log N increases at a strictly slower rate than n(1/3) as n diverges. On the other hand, if bootstrap methods are used for calibration, then we may choose log N almost as large as n(1/2) and still achieve asymptotic-level accuracy. The implications of these results are explored both theoretically and numerically.
引用
收藏
页码:1282 / 1288
页数:7
相关论文
共 35 条
[1]  
Benjamini Y, 2001, ANN STAT, V29, P1165
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]  
Bentkus V, 1996, ANN PROBAB, V24, P491
[4]   Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations [J].
Bickel, PJ ;
Levina, E .
BERNOULLI, 2004, 10 (06) :989-1010
[5]  
CLARKE S, 2007, UNPUB ROBUSTNESS MUL
[6]   Higher criticism for detecting sparse heterogeneous mixtures [J].
Donoho, D ;
Jin, JS .
ANNALS OF STATISTICS, 2004, 32 (03) :962-994
[7]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[8]   Large-scale simultaneous hypothesis testing: The choice of a null hypothesis [J].
Efron, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2004, 99 (465) :96-104
[9]   Correlation and large-scale simultaneous significance testing [J].
Efron, Bradley .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (477) :93-103
[10]  
Fan J., 2006, INT C MATHEMATICIANS, VIII, P595, DOI DOI 10.4171/022-3/31