A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

被引：5

作者：

Chen, Hao ^{[1
]}

Xia, Yin ^{[2
]}

机构：

[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA

[2] Fudan Univ, Sch Management, Dept Stat, Shanghai 200433, Peoples R China

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2023年 / 118卷 / 541期

关键词：

Covariance matrix estimation; High-dimensional test; Nearest neighbor; 2-SAMPLE TEST; MULTIVARIATE NORMALITY; SELECTION; CLASSIFICATION; MODEL;

D O I：

10.1080/01621459.2021.1953507

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension grows with the sample size and at the same time exhibit a superior power performance of the new test compared with alternative methods. We also illustrate our approach through two popularly used datasets in high-dimensional classification and clustering literatures where deviation from the normality assumption may lead to invalid conclusions.

引用

页码：719 / 731

页数：13

共 47 条

[1] Anderson T. W., 2003, INTRO MULTIVARIATE S
[2] Baringhaus L., 1988, METRIKA, V35, P339, DOI [DOI 10.1007/BF02613322, 10.1007/BF02613322]
[3] A multidimensional goodness-of-fit test based on interpoint distances
Bartoszynski, R
Pearl, DK
Lawrence, J
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) : 577 - 586
[4] VALID POST-SELECTION INFERENCE
Berk, Richard
Brown, Lawrence
Buja, Andreas
Zhang, Kai
Zhao, Linda
[J]. ANNALS OF STATISTICS, 2013, 41 (02) : 802 - 837
[5] Regularized estimation of large covariance matrices
Bickel, Peter J.
Levina, Elizaveta
[J]. ANNALS OF STATISTICS, 2008, 36 (01) : 199 - 227
[6] Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations
Bickel, PJ
Levina, E
[J]. BERNOULLI, 2004, 10 (06) : 989 - 1010
[7] Two-sample test of high dimensional means under dependence
Cai, T. Tony
Liu, Weidong
Xia, Yin
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (02) : 349 - 372
[8] A Direct Estimation Approach to Sparse Linear Discriminant Analysis
Cai, Tony
Liu, Weidong
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) : 1566 - 1577
[9] OPTIMAL DETECTION OF MULTI-SAMPLE ALIGNED SPARSE SIGNALS
Chan, Hock Peng
Walther, Guenther
[J]. ANNALS OF STATISTICS, 2015, 43 (05) : 1865 - 1895
[10] SEQUENTIAL CHANGE-POINT DETECTION BASED ON NEAREST NEIGHBORS
Chen, Hao
[J]. ANNALS OF STATISTICS, 2019, 47 (03) : 1381 - 1407

← 1 2 3 4 5 →