A New Graph-Based Two-Sample Test for Multivariate and Object Data

被引:75
作者
Chen, Hao [1 ]
Friedman, Jerome H. [2 ]
机构
[1] Univ Calif Davis, Dept Stat, 4218 Math Sci, Davis, CA 95616 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
General alternatives; Nonparametrics; Permutation null distribution; Similarity graph; COVARIATE BALANCE; SMIRNOV; DISTRIBUTIONS; NETWORK; SAMPLE;
D O I
10.1080/01621459.2016.1147356
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This article presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test uses a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.
引用
收藏
页码:397 / 409
页数:13
相关论文
共 50 条
  • [21] Transportation Object Counting With Graph-Based Adaptive Auxiliary Learning
    Meng, Yanda
    Bridge, Joshua
    Zhao, Yitian
    Joddrell, Martha
    Qiao, Yihong
    Yang, Xiaoyun
    Huang, Xiaowei
    Zheng, Yalin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (03) : 3422 - 3437
  • [22] Graph-based method for human-object interactions detection
    Xia, Li-min
    Wu, Wei
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2021, 28 (01) : 205 - 218
  • [23] Two-sample homogeneity tests based on divergence measures
    Wornowizki, Max
    Fried, Roland
    COMPUTATIONAL STATISTICS, 2016, 31 (01) : 291 - 313
  • [24] Rank Tests for Two-Sample Problems Based on Multiple Type-II Censored Data
    Chikkagoudar, M. S.
    Biradar, B. S.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (18) : 3203 - 3221
  • [25] On high dimensional two-sample tests based on nearest neighbors
    Mondal, Pronoy K.
    Biswas, Munmun
    Ghosh, Anil K.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2015, 141 : 168 - 178
  • [26] Two-sample density-based empirical likelihood tests for incomplete data in application to a pneumonia study
    Vexler, Albert
    Yu, Jihnhee
    BIOMETRICAL JOURNAL, 2011, 53 (04) : 628 - 651
  • [27] Biospytial: spatial graph-based computing for ecological Big Data
    Molgora, Juan M. Escamilla
    Sedda, Luigi
    Atkinson, Peter M.
    GIGASCIENCE, 2020, 9 (05):
  • [28] Maximum of the weighted Kaplan-Meier tests for the two-sample censored data
    Lee, Seung-Hwan
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2011, 81 (08) : 1017 - 1026
  • [29] A class of nonparametric tests for the two-sample problem based on order statistics
    Karakaya, Kadir
    Sert, Sumeyra
    Abusaif, Ihab
    Kus, Coskun
    Ng, Hon Keung Tony
    Nagaraja, Haikady N.
    JOURNAL OF NONPARAMETRIC STATISTICS, 2025, 37 (01) : 230 - 263
  • [30] Two-sample nonparametric test for comparing mean time to failure functions in age replacement
    Bhattacharyya, Dhrubasish
    Khan, Ruhul Ali
    Mitra, Murari
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2021, 212 : 34 - 44