A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

被引:5
作者
Chen, Hao [1 ]
Xia, Yin [2 ]
机构
[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
[2] Fudan Univ, Sch Management, Dept Stat, Shanghai 200433, Peoples R China
关键词
Covariance matrix estimation; High-dimensional test; Nearest neighbor; 2-SAMPLE TEST; MULTIVARIATE NORMALITY; SELECTION; CLASSIFICATION; MODEL;
D O I
10.1080/01621459.2021.1953507
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension grows with the sample size and at the same time exhibit a superior power performance of the new test compared with alternative methods. We also illustrate our approach through two popularly used datasets in high-dimensional classification and clustering literatures where deviation from the normality assumption may lead to invalid conclusions.
引用
收藏
页码:719 / 731
页数:13
相关论文
共 47 条
[31]   An Arabidopsis gene network based on the graphical Gaussian model [J].
Ma, Shisong ;
Gong, Qingqiu ;
Bohnert, Hans J. .
GENOME RESEARCH, 2007, 17 (11) :1614-1625
[32]   A direct approach to sparse discriminant analysis in ultra-high dimensions [J].
Mai, Qing ;
Zou, Hui ;
Yuan, Ming .
BIOMETRIKA, 2012, 99 (01) :29-42
[33]   MEASURES OF MULTIVARIATE SKEWNESS AND KURTOSIS WITH APPLICATIONS [J].
MARDIA, KV .
BIOMETRIKA, 1970, 57 (03) :519-&
[34]   Multivariate multidistance tests for high-dimensional low sample size case-control studies [J].
Marozzi, Marco .
STATISTICS IN MEDICINE, 2015, 34 (09) :1511-1526
[35]   Sparse permutation invariant covariance estimation [J].
Rothman, Adam J. ;
Bickel, Peter J. ;
Levina, Elizaveta ;
Zhu, Ji .
ELECTRONIC JOURNAL OF STATISTICS, 2008, 2 :494-515
[36]  
ROYSTON JP, 1983, J R STAT SOC C-APPL, V32, P121
[37]   MULTIVARIATE 2-SAMPLE TESTS BASED ON NEAREST NEIGHBORS [J].
SCHILLING, MF .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1986, 81 (395) :799-806
[38]   AN ANALYSIS OF VARIANCE TEST FOR NORMALITY (COMPLETE SAMPLES) [J].
SHAPIRO, SS ;
WILK, MB .
BIOMETRIKA, 1965, 52 :591-&
[39]   A TEST TO DETERMINE THE MULTIVARIATE NORMALITY OF A DATA SET [J].
SMITH, SP ;
JAIN, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (05) :757-761
[40]   Post-selection inference for 1-penalized likelihood models [J].
Taylor, Jonathan ;
Tibshirani, Robert .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2018, 46 (01) :41-61