Testing linear hypotheses in high-dimensional regressions

被引:20
作者
Bai, Zhidong [1 ,2 ]
Jiang, Dandan [3 ]
Yao, Jian-feng [4 ]
Zheng, Shurong [1 ,2 ]
机构
[1] NE Normal Univ, KLASMOE, Changchun 130024, Peoples R China
[2] NE Normal Univ, Sch Math & Stat, Changchun 130024, Peoples R China
[3] Jilin Univ, Inst Math, Changchun 130021, Peoples R China
[4] Univ Hong Kong, Dept Stat & Actuarial Sci, Pokfulam, Hong Kong, Peoples R China
关键词
high-dimensional data; multivariate regression; multivariate analysis of variance; Wilk's test; multiple sample significance test; random matrices; COVARIANCE-MATRIX; SAMPLE;
D O I
10.1080/02331888.2012.708031
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
For a multivariate linear model, Wilk's likelihood ratio test (LRT) constitutes one of the cornerstone tools. However, the computation of its quantiles under the null or the alternative hypothesis requires complex analytic approximations, and more importantly, these distributional approximations are feasible only for moderate dimension of the dependent variable, say p20. On the other hand, assuming that the data dimension p as well as the number q of regression variables are fixed while the sample size n grows, several asymptotic approximations are proposed in the literature for Wilk's including the widely used chi-square approximation. In this paper, we consider necessary modifications to Wilk's test in a high-dimensional context, specifically assuming a high data dimension p and a large sample size n. Based on recent random matrix theory, the correction we propose to Wilk's test is asymptotically Gaussian under the null hypothesis and simulations demonstrate that the corrected LRT has very satisfactory size and power, surely in the large p and large n context, but also for moderately large data dimensions such as p=30 or p=50. As a byproduct, we give a reason explaining why the standard chi-square approximation fails for high-dimensional data. We also introduce a new procedure for the classical multiple sample significance test in multivariate analysis of variance which is valid for high-dimensional data.
引用
收藏
页码:1207 / 1223
页数:17
相关论文
共 15 条
[1]  
Anderson TW, 2003, INTRO MULTIVARIATE S
[2]  
Bai Z. D., 2006, SPECTRAL THEORY LARG, P1
[3]  
Bai ZD, 1999, STAT SINICA, V9, P611
[4]  
Bai ZD, 1996, STAT SINICA, V6, P311
[5]   CORRECTIONS TO LRT ON LARGE-DIMENSIONAL COVARIANCE MATRIX BY RMT [J].
Bai, Zhidong ;
Jiang, Dandan ;
Yao, Jian-Feng ;
Zheng, Shurong .
ANNALS OF STATISTICS, 2009, 37 (6B) :3822-3840
[6]  
Bartlett MS, 1934, P CAMB PHILOS SOC, V30, P327
[7]   A GENERAL DISTRIBUTION THEORY FOR A CLASS OF LIKELIHOOD CRITERIA [J].
BOX, GEP .
BIOMETRIKA, 1949, 36 (3-4) :317-346
[8]   A HIGH DIMENSIONAL 2 SAMPLE SIGNIFICANCE TEST [J].
DEMPSTER, AP .
ANNALS OF MATHEMATICAL STATISTICS, 1958, 29 (04) :995-1010
[9]  
Ledoit O, 2002, ANN STAT, V30, P1081