Adaptive Testing for High-Dimensional Data

被引:0
作者
Zhang, Yangfan [1 ]
Wang, Runmin [2 ]
Shao, Xiaofeng [3 ]
机构
[1] Two Sigma Investments, New York, NY USA
[2] Texas A&M Univ, Dept Stat, 3143 TAMU, College Stn, TX 77843 USA
[3] Univ Illinois, Dept Stat, Champaign, IL USA
关键词
Independence testing; Simultaneous testing; Spatial sign; U-statistics; HIGHER CRITICISM; COVARIANCE-MATRIX; 2-SAMPLE TEST; ASYMPTOTIC DISTRIBUTIONS; U-STATISTICS; INDEPENDENCE; COHERENCE; SIGNALS; ANOVA;
D O I
10.1080/01621459.2024.2439617
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we propose a class of L-q -norm based U-statistics for a family of global testing problems related to high-dimensional data. This includes testing of mean vector and its spatial sign, simultaneous testing of linear model coefficients, and testing of component-wise independence for high-dimensional observations, among others. Under the null hypothesis, we derive asymptotic normality and independence between L-q -norm based U-statistics for several qs under mild moment and cumulant conditions. A simple combination of two studentized L-q -based test statistics via their p-values is proposed and is shown to attain great power against alternatives of different sparsity. Our work is a substantial extension of He et al., which is mostly focused on mean and covariance testing, and we manage to provide a general treatment of asymptotic independence of L-q -norm based U-statistics for a wide class of kernels. To alleviate the computation burden, we introduce a variant of the proposed U-statistics by using the monotone indices in the summation, resulting in a U-statistic with asymmetric kernel. A dynamic programming method is introduced to reduce the computational cost from O(n(qr)) , which is required for the calculation of the full U-statistic, to O(n (R)) where r is the order of the kernel. Numerical results further corroborate the advantage of the proposed adaptive test as compared to some existing competitors. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
引用
收藏
页数:13
相关论文
共 37 条
[1]  
Andrews D.W., “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,”, Econometrica: Journal of the Econometric Society, 59, pp. 817-858, (1991)
[2]  
Arcones M.A., Gine E., “On the Bootstrap of U and V Statistics,”, The Annals of Statistics, 20, pp. 655-674, (1992)
[3]  
Arias-Castro E., Candes E.J., Plan Y., “Global Testing under Sparse Alternatives: Anova, Multiple Comparisons and the Higher Criticism,”, The Annals of Statistics, 39, pp. 2533-2556, (2011)
[4]  
Bai Z., Jiang D., Yao J.-F., Zheng S., “Corrections to LRT on Large-Dimensional Covariance Matrix by RMT,”, The Annals of Statistics, 37, pp. 3822-3840, (2009)
[5]  
Bai Z., Saranadasa H., “Effect of High Dimension: By an Example of a Two Sample Problem,”, Statistica Sinica, 6, pp. 311-329, (1996)
[6]  
Cai T., Liu W., Xia Y., “Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings,”, Journal of the American Statistical Association, 108, pp. 265-277, (2013)
[7]  
Cai T.T., Jiang T., “Limiting Laws of Coherence of Random Matrices with Applications to Testing Covariance Structure and Construction of Compressed Sensing Matrices,”, The Annals of Statistics, 39, pp. 1496-1525, (2011)
[8]  
Cai T.T., Liu W., Xia Y., “Two-Sample Test of High Dimensional Means Under Dependence,”, Journal of the Royal Statistical Society, Series B, 76, pp. 349-372, (2014)
[9]  
Chakraborty A., Chaudhuri P., “Tests for High-Dimensional Data based on Means, Spatial Signs and Spatial Ranks,”, The Annals of Statistics, 45, pp. 771-799, (2017)
[10]  
Chen S.X., Li J., Zhong P.-S., “Two-Sample and Anova Tests for High Dimensional Means,”, The Annals of Statistics, 47, pp. 1443-1474, (2019)