Quantization/Clustering: when and why does k-means work?

被引:0
作者
Levrard, Clement [1 ]
机构
[1] Univ Paris Diderot, LPMA, 8 Pl Aure Lie Nemours, F-75013 Paris, France
来源
JOURNAL OF THE SFDS | 2018年 / 159卷 / 01期
关键词
k-means; clustering; quantization; separation rate; distortion;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Though mostly used as a clustering algorithm, k-means is originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon Levrard (2015); Tang and Monteleoni (2016a), we try to investigate how and when these two approaches are compatible. Namely, we show that provided the sample distribution satisfies a margin like condition (in the sense of Mammen and Tsybakov, 1999 for supervised learning), both the associated empirical risk minimizer and the output of Lloyd's algorithm provide almost optimal classification in certain cases (in the sense of Azizyan et al., 2013). Besides, we also show that they achieved fast and optimal convergence rates in terms of sample size and compression risk.
引用
收藏
页码:1 / 26
页数:26
相关论文
共 36 条
[1]  
Antoniadis A., 2011, RR7515
[2]   Individual convergence rates in empirical vector quantizer design [J].
Antos, A ;
Györfi, L ;
György, A .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (11) :4013-4022
[3]  
Arthur D., 2007, P 18 ANN ACM SIAM S, DOI DOI 10.1145/1283383.1283494
[4]   Projection-based curve clustering [J].
Auder, Benjamin ;
Fischer, Aurelie .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2012, 82 (08) :1145-1168
[5]  
Awasthi Pranjal, 2012, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. Proceedings 15th International Workshop, APPROX 2012 and 16th International Workshop, RANDOM 2012, P37, DOI 10.1007/978-3-642-32512-0_4
[6]  
Azizyan M., 2013, ADV NEURAL INFORM PR, V26, P2139
[7]   On the performance of clustering in Hilbert spaces [J].
Biau, Gerard ;
Devroye, Luc ;
Lugosi, Gabor .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2008, 54 (02) :781-790
[8]  
Brezis H., 2010, FUNCTIONAL ANAL SOBO
[9]  
Bunea F., 2016, ARXIV E PRINTS
[10]   A CLASSIFICATION EM ALGORITHM FOR CLUSTERING AND 2 STOCHASTIC VERSIONS [J].
CELEUX, G ;
GOVAERT, G .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1992, 14 (03) :315-332