ROBUST BREGMAN CLUSTERING

被引:3
作者
Brecheteau, Claire [1 ]
Fischer, Aurelie [2 ]
Levrard, Clement [2 ]
机构
[1] Univ Rennes 2, Rennes, France
[2] Univ Paris, Lab Probabilites Stat & Modelisat, Paris, France
关键词
Clustering; Bregman divergence; robustness; TRIMMING APPROACH; K-MEANS; QUANTIZATION; SQUARES;
D O I
10.1214/20-AOS2018
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Clustering with Bregman divergences encompasses a wide family of clustering procedures that are well suited to mixtures of distributions from exponential families (J. Mach. Learn. Res. 6 (2005) 1705-1749). However, these techniques are highly sensitive to noise. To address the issue of clustering data with possibly adversarial noise, we introduce a robustified version of Bregman clustering based on a trimming approach. We investigate its theoretical properties, showing for instance that our estimator converges at a sub-Gaussian rate 1/root n, where n denotes the sample size, under mild tail assumptions. We also show that it is robust to a certain amount of noise, stated in terms of breakdown point. We also derive a Lloyd-type algorithm with a trimming parameter, along with a heuristic to select this parameter and the number of clusters from sample. Some numerical experiments assess the performance of our method on simulated and real datasets.
引用
收藏
页码:1679 / 1701
页数:23
相关论文
共 37 条
[1]  
[Anonymous], 1970, CONVEX ANAL
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]  
ATONI O., 2018, DIMENSION FREE PAC B
[4]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1705
[5]   On the optimality of conditional expectation as a Bregman predictor [J].
Banerjee, A ;
Guo, X ;
Wang, H .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (07) :2664-2669
[6]   On the performance of clustering in Hilbert spaces [J].
Biau, Gerard ;
Devroye, Luc ;
Lugosi, Gabor .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2008, 54 (02) :781-790
[7]  
BRECHETEAU C., 2021, ROBUST BREGMAN CLU S, DOI [10.1214/20-AOS2018SUPP, DOI 10.1214/20-AOS2018SUPP]
[8]  
Bregman L., 1967, COMP MATH MATH PHYS+, V7, P620
[9]   EMPIRICAL RISK MINIMIZATION FOR HEAVY-TAILED LOSSES [J].
Brownlees, Christian ;
Joly, Emilien ;
Lugosi, Gabor .
ANNALS OF STATISTICS, 2015, 43 (06) :2507-2536
[10]   Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm [J].
Cardot, Herve ;
Cenac, Peggy ;
Zitt, Pierre-Andre .
BERNOULLI, 2013, 19 (01) :18-43