Quantization/Clustering: when and why does k-means work?

被引：0

作者：

Levrard, Clement ^{[1
]}

机构：

[1] Univ Paris Diderot, LPMA, 8 Pl Aure Lie Nemours, F-75013 Paris, France

来源：

JOURNAL OF THE SFDS | 2018年 / 159卷 / 01期

关键词：

k-means; clustering; quantization; separation rate; distortion;

D O I：

暂无

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Though mostly used as a clustering algorithm, k-means is originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon Levrard (2015); Tang and Monteleoni (2016a), we try to investigate how and when these two approaches are compatible. Namely, we show that provided the sample distribution satisfies a margin like condition (in the sense of Mammen and Tsybakov, 1999 for supervised learning), both the associated empirical risk minimizer and the output of Lloyd's algorithm provide almost optimal classification in certain cases (in the sense of Azizyan et al., 2013). Besides, we also show that they achieved fast and optimal convergence rates in terms of sample size and compression risk.

引用

页码：1 / 26

页数：26

共 50 条

[41] Strong Consistency of Reduced K-means Clustering
Terada, Yoshikazu
SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (04) : 913 - 931
[42] Stable Initialization Scheme for K-Means Clustering
XU Junling1
2. State Key Laboratory of Software Engineering
3. Department of Computer
Wuhan University Journal of Natural Sciences, 2009, 14 (01) : 24 - 28
[43] Mahalanobis Distance Based K-Means Clustering
Brown, Paul O.
Chiang, Meng Ching
Guo, Shiqing
Jin, Yingzi
Leung, Carson K.
Murray, Evan L.
Pazdor, Adam G. M.
Cuzzocrea, Alfredo
BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 256 - 262
[44] K-means*: Clustering by gradual data transformation
Malinen, Mikko I.
Mariescu-Istodor, Radu
Franti, Pasi
PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
[45] Online K-Means Clustering with Lightweight Coresets
Low, Jia Shun
Ghafoori, Zahra
Leckie, Christopher
AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 191 - 202
[46] Scalable Multiple Kernel k-means Clustering
Lu, Yihang
Xin, Haonan
Wang, Rong
Nie, Feiping
Li, Xuelong
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4279 - 4283
[47] Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral
Wahyuningrum, Tenia
Khomsah, Siti
Suyanto, Suyanto
Meliana, Selly
Yunanto, Prasti Eko
Al Maki, Wikky F.
2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
[48] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
Banerjee, Shreya
Choudhary, Ankit
Pal, Somnath
2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
[49] Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering
Makarychev, Konstantin
Makarychev, Yury
Razenshteyn, Ilya
PROCEEDINGS OF THE 51ST ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '19), 2019, : 1027 - 1038
[50] The LINEX Weighted k-Means Clustering
Ahmadzadehgoli, Narges
Mohammadpour, Adel
Behzadi, Mohammad Hassan
JOURNAL OF STATISTICAL THEORY AND APPLICATIONS, 2019, 18 (02): : 147 - 154

← 1 2 3 4 5 →