SPECTRAL REGULARIZED KERNEL TWO-SAMPLE TESTS

被引:0
作者
Hagrass, Omar [1 ]
Sriperumbudur, Bharath K. [1 ]
Li, Bing [1 ]
机构
[1] Penn State Univ, Dept Stat, State Coll, PA 16802 USA
基金
美国国家科学基金会;
关键词
Two-sample test; maximum mean discrepancy; reproducing kernel Hilbert space; permutation test; U-statistics; Bernstein's inequality; spectral regularization; adaptivity; covariance operator; MERCERS THEOREM; NYSTROM METHOD;
D O I
10.1214/24-AOS2383
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Over the last decade, an approach that has gained a lot of popularity to mains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show the popular MMD (maximum mean discrepancy) twosample test to be not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real data, we demonstrate the superior performance of the proposed test in comparison to the MMD test and other popular tests in the literature.
引用
收藏
页码:1076 / 1101
页数:26
相关论文
共 49 条
[1]  
Adams R A., 2003, Sobolev Spaces
[2]   ADAPTIVE TEST OF INDEPENDENCE BASED ON HSIC MEASURES [J].
Albert, Melisande ;
Laurent, Beatrice ;
Marrel, Amandine ;
Meynaoui, Anouar .
ANNALS OF STATISTICS, 2022, 50 (02) :858-879
[3]  
[Anonymous], 2008, ADV NEURAL INFORM PR, DOI DOI 10.5555/2981562.2981710
[4]   THEORY OF REPRODUCING KERNELS [J].
ARONSZAJN, N .
TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1950, 68 (MAY) :337-404
[5]  
Balasubramanian Krishnakumar, 2021, JOURNAL OF MACHINE LEARNING RESEARCH, V22
[6]   On regularization algorithms in learning theory [J].
Bauer, Frank ;
Pereverzev, Sergei ;
Rosasco, Lorenzo .
JOURNAL OF COMPLEXITY, 2007, 23 (01) :52-72
[7]  
BRAUN R., 2022, fasano.franceschini.test: An Implementation of a Multidimensional KS Test in R.
[8]  
Burnasev M. V., 1979, Teor. Veroyatnost. I Primenen., V24, P106
[9]  
Caponnetto A, 2007, FOUND COMPUT MATH, V7, P331, DOI [10.1007/s10208-006-0196-8, 10.1007/S10208-006-0196-8]
[10]  
Cucker F, 2007, C MO AP C M, P1, DOI 10.1017/CBO9780511618796