A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach

被引:5
作者
Chen, Hao [1 ]
Xia, Yin [2 ]
机构
[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA
[2] Fudan Univ, Sch Management, Dept Stat, Shanghai 200433, Peoples R China
关键词
Covariance matrix estimation; High-dimensional test; Nearest neighbor; 2-SAMPLE TEST; MULTIVARIATE NORMALITY; SELECTION; CLASSIFICATION; MODEL;
D O I
10.1080/01621459.2021.1953507
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many statistical methodologies for high-dimensional data assume the population is normal. Although a few multivariate normality tests have been proposed, to the best of our knowledge, none of them can properly control the Type I error when the dimension is larger than the number of observations. In this work, we propose a novel nonparametric test that uses the nearest neighbor information. The proposed method guarantees the asymptotic Type I error control under the high-dimensional setting. Simulation studies verify the empirical size performance of the proposed test when the dimension grows with the sample size and at the same time exhibit a superior power performance of the new test compared with alternative methods. We also illustrate our approach through two popularly used datasets in high-dimensional classification and clustering literatures where deviation from the normality assumption may lead to invalid conclusions.
引用
收藏
页码:719 / 731
页数:13
相关论文
共 47 条
  • [1] Anderson T. W., 2003, INTRO MULTIVARIATE S
  • [2] Baringhaus L., 1988, METRIKA, V35, P339, DOI [DOI 10.1007/BF02613322, 10.1007/BF02613322]
  • [3] A multidimensional goodness-of-fit test based on interpoint distances
    Bartoszynski, R
    Pearl, DK
    Lawrence, J
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) : 577 - 586
  • [4] VALID POST-SELECTION INFERENCE
    Berk, Richard
    Brown, Lawrence
    Buja, Andreas
    Zhang, Kai
    Zhao, Linda
    [J]. ANNALS OF STATISTICS, 2013, 41 (02) : 802 - 837
  • [5] Regularized estimation of large covariance matrices
    Bickel, Peter J.
    Levina, Elizaveta
    [J]. ANNALS OF STATISTICS, 2008, 36 (01) : 199 - 227
  • [6] Some theory for Fisher's linear discriminant function, 'naive Bayes', and some alternatives when there are many more variables than observations
    Bickel, PJ
    Levina, E
    [J]. BERNOULLI, 2004, 10 (06) : 989 - 1010
  • [7] Two-sample test of high dimensional means under dependence
    Cai, T. Tony
    Liu, Weidong
    Xia, Yin
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (02) : 349 - 372
  • [8] A Direct Estimation Approach to Sparse Linear Discriminant Analysis
    Cai, Tony
    Liu, Weidong
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) : 1566 - 1577
  • [9] OPTIMAL DETECTION OF MULTI-SAMPLE ALIGNED SPARSE SIGNALS
    Chan, Hock Peng
    Walther, Guenther
    [J]. ANNALS OF STATISTICS, 2015, 43 (05) : 1865 - 1895
  • [10] SEQUENTIAL CHANGE-POINT DETECTION BASED ON NEAREST NEIGHBORS
    Chen, Hao
    [J]. ANNALS OF STATISTICS, 2019, 47 (03) : 1381 - 1407