Ensemble Subsampling for Imbalanced Multivariate Two-Sample Tests

被引:11
作者
Chen, Lisha [1 ]
Dou, Winston Wei [2 ]
Qiao, Zhihua [3 ]
机构
[1] Yale Univ, Dept Stat, New Haven, CT 06511 USA
[2] MIT, Dept Financial Econ, Cambridge, MA 02139 USA
[3] JPMorgan Chase, Model Risk & Model Dev, New York, NY 10172 USA
关键词
Corporate finance; Ensemble methods; Imbalanced learning; Kolmogorov-Smirnov test; Nearest neighbors methods; Nonparametric two-sample tests; Subsampling methods; NEAREST-NEIGHBOR; DISTRIBUTIONS;
D O I
10.1080/01621459.2013.800763
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Some existing nonparametric two-sample tests for equality of multivariate distributions perform unsatisfactorily when the two sample sizes are unbalanced. In particular, the power of these tests tends to diminish with increasingly unbalanced sample sizes. In this article, we propose a new testing procedure to solve this problem. The proposed test, based on the nearest neighbor method by Schilling, employs a novel ensemble subsampling scheme to remedy this issue. More specifically, the test statistic is a weighted average of a collection of statistics, each associated with a randomly selected subsample of the data. We derive the asymptotic distribution of the test statistic under the null hypothesis and show that the new test is consistent against all alternatives when the ratio of the sample sizes either goes to a finite limit or tends to infinity. Via simulated data examples we demonstrate that the new test has increasing power with increasing sample size ratio when the size of the smaller sample is fixed. The test is applied to a real-data example in the field of corporate finance. Supplementary materials for this article are available online.
引用
收藏
页码:1308 / 1323
页数:16
相关论文
共 34 条