Robustness and accuracy of methods for high dimensional data analysis based on Student's t-statistic

被引:46
作者
Delaigle, Aurore [1 ]
Hall, Peter [2 ]
Jin, Jiashun [3 ]
机构
[1] Univ Melbourne, Dept Math & Stat, Parkville, Vic 3010, Australia
[2] Univ Calif Davis, Davis, CA 95616 USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
基金
澳大利亚研究理事会; 美国国家科学基金会;
关键词
Bootstrap; Central limit theorem; Classification; Dimension reduction; Higher criticism; Large deviation probability; Moderate deviation probability; Ranking; Second-order accuracy; Skewness; Tail probability; Variable selection; FALSE DISCOVERY RATE; MULTIPLE TEST PROCEDURES; BOOTSTRAP; APPROXIMATIONS; CONVERGENCE; PROPORTION; ERRORS; TESTS; RATES;
D O I
10.1111/j.1467-9868.2010.00761.x
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Student's t-statistic is finding applications today that were never envisaged when it was introduced more than a century ago. Many of these applications rely on properties, e.g. robustness against heavy-tailed sampling distributions, that were not explicitly considered until relatively recently. We explore these features of the t-statistic in the context of its application to very high dimensional problems, including feature selection and ranking, the simultaneous testing of many different hypotheses and sparse, high dimensional signal detection. Robustness properties of the t-ratio are highlighted, and it is established that those properties are preserved under applications of the bootstrap. In particular, bootstrap methods correct for skewness and therefore lead to second-order accuracy, even in the extreme tails. Indeed, it is shown that the bootstrap and also the more popular but less accurate t-distribution and normal approximations are more effective in the tails than towards the middle of the distribution. These properties motivate new methods, e.g. bootstrap-based techniques for signal detection, that confine attention to the significant tail of a statistic.
引用
收藏
页码:283 / 301
页数:19
相关论文
共 44 条
[1]   A tail inequality for suprema of unbounded empirical processes with applications to Markov chains [J].
Adamczak, Radoslaw .
ELECTRONIC JOURNAL OF PROBABILITY, 2008, 13 :1000-1034
[2]   Long- and short-range correlations in genome organization [J].
Almirantis, Y ;
Provata, A .
JOURNAL OF STATISTICAL PHYSICS, 1999, 97 (1-2) :233-262
[3]  
[Anonymous], 1908, BIOMETRIKA, V6, P1
[4]  
[Anonymous], 1998, Mathematical Methods in Statistics
[5]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[6]   Global and multiple test procedures using ordered p-values - a review [J].
Bernhard, G ;
Klein, M ;
Hommel, G .
STATISTICAL PAPERS, 2004, 45 (01) :1-14
[7]  
Blair RC, 1996, STAT MED, V15, P1107, DOI 10.1002/(SICI)1097-0258(19960615)15:11<1107::AID-SIM222>3.3.CO
[8]  
2-K
[9]   OPTIMAL RATES OF CONVERGENCE FOR ESTIMATING THE NULL DENSITY AND PROPORTION OF NONNULL EFFECTS IN LARGE-SCALE MULTIPLE TESTING [J].
Cai, T. Tony ;
Jin, Jiashun .
ANNALS OF STATISTICS, 2010, 38 (01) :100-145
[10]   ROBUSTNESS OF MULTIPLE TESTING PROCEDURES AGAINST DEPENDENCE [J].
Clarke, Sandy ;
Hall, Peter .
ANNALS OF STATISTICS, 2009, 37 (01) :332-358