A New Graph-Based Two-Sample Test for Multivariate and Object Data

被引:75
作者
Chen, Hao [1 ]
Friedman, Jerome H. [2 ]
机构
[1] Univ Calif Davis, Dept Stat, 4218 Math Sci, Davis, CA 95616 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
General alternatives; Nonparametrics; Permutation null distribution; Similarity graph; COVARIATE BALANCE; SMIRNOV; DISTRIBUTIONS; NETWORK; SAMPLE;
D O I
10.1080/01621459.2016.1147356
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This article presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test uses a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.
引用
收藏
页码:397 / 409
页数:13
相关论文
共 50 条
  • [31] One- and two-sample Bayesian prediction intervals based on progressively Type-II censored data
    El-Din, M. M. Mohie
    Shafay, A. R.
    STATISTICAL PAPERS, 2013, 54 (02) : 287 - 307
  • [32] One- and Two-Sample Bayesian Prediction Intervals Based on Type-II Hybrid Censored Data
    Balakrishnan, N.
    Shafay, A. R.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2012, 41 (09) : 1511 - 1531
  • [33] A new flexible Bayesian hypothesis test for multivariate data
    Gutierrez, Ivan
    Gutierrez, Luis
    Alvares, Danilo
    STATISTICS AND COMPUTING, 2023, 33 (02)
  • [34] Two-sample test for high-dimensional covariance matrices: A normal-reference approach
    Wang, Jingyi
    Zhu, Tianming
    Zhang, Jin-Ting
    JOURNAL OF MULTIVARIATE ANALYSIS, 2024, 204
  • [35] Two-sample homogeneity testing: A procedure based on comparing distributions of interpoint distances
    Montero-Manso, Pablo
    Vilar, Jose A.
    STATISTICAL ANALYSIS AND DATA MINING, 2019, 12 (03) : 234 - 252
  • [36] Graph-Based Fusion of Imaging, Genetic and Clinical Data for Degenerative Disease Diagnosis
    Guo, Rui
    Tian, Xu
    Lin, Hanhe
    McKenna, Stephen
    Li, Hong-Dong
    Guo, Fei
    Liu, Jin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (01) : 57 - 68
  • [37] Semi-parametric hybrid empirical likelihood inference for two-sample comparison with censored data
    Su, Haiyan
    Zhou, Mai
    Liang, Hua
    LIFETIME DATA ANALYSIS, 2011, 17 (04) : 533 - 551
  • [38] Robust rank-based meta-analyses for two-sample designs with application to platelet counts of malaria infection data
    Lang, Yanda
    McKean, Joseph W.
    Ozturk, Omer
    STATISTICS IN MEDICINE, 2023, 42 (17) : 2887 - 2913
  • [39] Two-sample nonparametric prediction intervals based on random number of generalized order statistics
    Barakat, H. M.
    El-Adll, Magdy E.
    Aly, Amany E.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (19) : 4571 - 4586
  • [40] A Framework for Mining Life Sciences Data on the Semantic Web in an Interactive, Graph-Based Environment
    Lysenko, Artem
    Grzebyta, Jacek
    Hindle, Matthew M.
    Rawlings, Chris J.
    Splendiani, Andrea
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS: 10TH INTERNATIONAL MEETING, 2014, 8452 : 225 - 237