Asymptotic normality of interpoint distances for high-dimensional data with applications to the two-sample problem

被引:20
作者
Li, Jun [1 ]
机构
[1] Univ Calif Riverside, Dept Stat, 1337 Olmsted Hall, Riverside, CA 92521 USA
关键词
Asymptotic normality; High-dimensional data; Interpoint distance; Strong mixing condition; Two-sample problem; GEOMETRIC REPRESENTATION; MILD CONDITIONS; MULTIVARIATE; TESTS;
D O I
10.1093/biomet/asy020
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Interpoint distances have applications in many areas of probability and statistics. Thanks to their simplicity of computation, interpoint distance-based procedures are particularly appealing for analysing small samples of high-dimensional data. In this paper, we first study the asymptotic distribution of interpoint distances in the high-dimension, low-sample-size setting and show that it is normal under regularity conditions. We then construct a powerful test for the two-sample problem, which is consistent for detecting location and scale differences. Simulations show that the test compares favourably with existing distance-based tests.
引用
收藏
页码:529 / 546
页数:18
相关论文
共 24 条
[1]   The high-dimension, low-sample-size geometric representation holds under mild conditions [J].
Ahn, Jeongyoun ;
Marron, J. S. ;
Muller, Keith M. ;
Chi, Yueh-Yun .
BIOMETRIKA, 2007, 94 (03) :760-766
[2]   Asymptotic Normality for Inference on Multisample, High-Dimensional Mean Vectors Under Mild Conditions [J].
Aoshima, Makoto ;
Yata, Kazuyoshi .
METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2015, 17 (02) :419-439
[3]   On a new multivariate two-sample test [J].
Baringhaus, L ;
Franz, C .
JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (01) :190-206
[4]   A multidimensional goodness-of-fit test based on interpoint distances [J].
Bartoszynski, R ;
Pearl, DK ;
Lawrence, J .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :577-586
[5]   A distribution-free two-sample run test applicable to high-dimensional data [J].
Biswas, Munmun ;
Mukhopadhyay, Minerva ;
Ghosh, Anil K. .
BIOMETRIKA, 2014, 101 (04) :913-926
[6]   A nonparametric two-sample test applicable to high dimensional data [J].
Biswas, Munmun ;
Ghosh, Anil K. .
JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 123 :160-171
[7]   The interpoint distance distribution as a descriptor of point patterns, with an application to spatial disease clustering [J].
Bonetti, M ;
Pagano, M .
STATISTICS IN MEDICINE, 2005, 24 (05) :753-773
[8]  
Bradley R.C., 2007, Introduction to Strong Mixing Conditions
[9]   A New Graph-Based Two-Sample Test for Multivariate and Object Data [J].
Chen, Hao ;
Friedman, Jerome H. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (517) :397-409
[10]   On some transformations of high dimension, low sample size data for nearest neighbor classification [J].
Dutta, Subhajit ;
Ghosh, Anil K. .
MACHINE LEARNING, 2016, 102 (01) :57-83