A New Graph-Based Two-Sample Test for Multivariate and Object Data

被引：75

作者：

Chen, Hao ^{[1
]}

Friedman, Jerome H. ^{[2
]}

机构：

[1] Univ Calif Davis, Dept Stat, 4218 Math Sci, Davis, CA 95616 USA

[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2017年 / 112卷 / 517期

基金：

美国国家科学基金会;

关键词：

General alternatives; Nonparametrics; Permutation null distribution; Similarity graph; COVARIATE BALANCE; SMIRNOV; DISTRIBUTIONS; NETWORK; SAMPLE;

D O I：

10.1080/01621459.2016.1147356

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This article presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test uses a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.

引用

页码：397 / 409

页数：13

共 50 条

[31] One- and two-sample Bayesian prediction intervals based on progressively Type-II censored data
El-Din, M. M. Mohie
Shafay, A. R.
STATISTICAL PAPERS, 2013, 54 (02) : 287 - 307
[32] One- and Two-Sample Bayesian Prediction Intervals Based on Type-II Hybrid Censored Data
Balakrishnan, N.
Shafay, A. R.
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2012, 41 (09) : 1511 - 1531
[33] A new flexible Bayesian hypothesis test for multivariate data
Gutierrez, Ivan
Gutierrez, Luis
Alvares, Danilo
STATISTICS AND COMPUTING, 2023, 33 (02)
[34] Two-sample test for high-dimensional covariance matrices: A normal-reference approach
Wang, Jingyi
Zhu, Tianming
Zhang, Jin-Ting
JOURNAL OF MULTIVARIATE ANALYSIS, 2024, 204
[35] Two-sample homogeneity testing: A procedure based on comparing distributions of interpoint distances
Montero-Manso, Pablo
Vilar, Jose A.
STATISTICAL ANALYSIS AND DATA MINING, 2019, 12 (03) : 234 - 252
[36] Graph-Based Fusion of Imaging, Genetic and Clinical Data for Degenerative Disease Diagnosis
Guo, Rui
Tian, Xu
Lin, Hanhe
McKenna, Stephen
Li, Hong-Dong
Guo, Fei
Liu, Jin
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (01) : 57 - 68
[37] Semi-parametric hybrid empirical likelihood inference for two-sample comparison with censored data
Su, Haiyan
Zhou, Mai
Liang, Hua
LIFETIME DATA ANALYSIS, 2011, 17 (04) : 533 - 551
[38] Robust rank-based meta-analyses for two-sample designs with application to platelet counts of malaria infection data
Lang, Yanda
McKean, Joseph W.
Ozturk, Omer
STATISTICS IN MEDICINE, 2023, 42 (17) : 2887 - 2913
[39] Two-sample nonparametric prediction intervals based on random number of generalized order statistics
Barakat, H. M.
El-Adll, Magdy E.
Aly, Amany E.
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (19) : 4571 - 4586
[40] A Framework for Mining Life Sciences Data on the Semantic Web in an Interactive, Graph-Based Environment
Lysenko, Artem
Grzebyta, Jacek
Hindle, Matthew M.
Rawlings, Chris J.
Splendiani, Andrea
COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS: 10TH INTERNATIONAL MEETING, 2014, 8452 : 225 - 237

← 1 2 3 4 5 →