GRAPH-BASED TWO-SAMPLE TESTS FOR DATA WITH REPEATED OBSERVATIONS

被引:7
作者
Zhang, Jingru [1 ]
Chen, Hao [2 ]
机构
[1] Univ Penn, Dept Biostat Epidemiol & Informat, 423 Guardian Dr, Philadelphia, PA 19104 USA
[2] Univ Calif Davis, Dept Stat, One Shields Ave, Davis, CA 95616 USA
关键词
High-dimensional data; network data; non-euclidean data; nonparametric test; similarity graph; ties in distance; FEWER OBSERVATIONS; MULTIVARIATE;
D O I
10.5705/ss.202019.0116
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
y For two-sample comparisons, tests based on graphs constructed using the similarity information between observations are gaining attention, owing to their flexibility and good performance for high-dimensional/non-Euclidean data. However, when there are repeated observations, these graph-based tests can be problematic, because they are influenced by the choice of the similarity graph. We propose extended graph-based test statistics to resolve this problem. We also study the asymptotic properties of these extended statistics, and provide analytic formulae to approximate the p-values of the tests under finite samples, facilitating the application of the new tests in practice. The proposed tests are applied to analyze a phone-call network data set. All tests are implemented in the R package gTests.
引用
收藏
页码:391 / 415
页数:25
相关论文
共 18 条
[1]  
Bai ZD, 1996, STAT SINICA, V6, P311
[2]   Two-sample test of high dimensional means under dependence [J].
Cai, T. Tony ;
Liu, Weidong ;
Xia, Yin .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2014, 76 (02) :349-372
[3]   Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings [J].
Cai, Tony ;
Liu, Weidong ;
Xia, Yin .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (501) :265-277
[4]   A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data [J].
Chen, Hao ;
Chen, Xu ;
Su, Yi .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (523) :1146-1155
[5]   A New Graph-Based Two-Sample Test for Multivariate and Object Data [J].
Chen, Hao ;
Friedman, Jerome H. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (517) :397-409
[6]   GRAPH-BASED TESTS FOR TWO-SAMPLE COMPARISONS OF CATEGORICAL DATA [J].
Chen, Hao ;
Zhang, Nancy R. .
STATISTICA SINICA, 2013, 23 (04) :1479-1503
[7]   A TWO-SAMPLE TEST FOR HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO GENE-SET TESTING [J].
Chen, Song Xi ;
Qin, Ying-Li .
ANNALS OF STATISTICS, 2010, 38 (02) :808-835
[8]   Inferring friendship network structure by using mobile phone data [J].
Eagle, Nathan ;
Pentland, Alex ;
Lazer, David .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (36) :15274-15278
[9]   MULTIVARIATE GENERALIZATIONS OF THE WALD-WOLFOWITZ AND SMIRNOV 2-SAMPLE TESTS [J].
FRIEDMAN, JH ;
RAFSKY, LC .
ANNALS OF STATISTICS, 1979, 7 (04) :697-717
[10]  
Gretton A, 2012, J MACH LEARN RES, V13, P723