Quantization/Clustering: when and why does k-means work?

被引:0
作者
Levrard, Clement [1 ]
机构
[1] Univ Paris Diderot, LPMA, 8 Pl Aure Lie Nemours, F-75013 Paris, France
来源
JOURNAL OF THE SFDS | 2018年 / 159卷 / 01期
关键词
k-means; clustering; quantization; separation rate; distortion;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Though mostly used as a clustering algorithm, k-means is originally designed as a quantization algorithm. Namely, it aims at providing a compression of a probability distribution with k points. Building upon Levrard (2015); Tang and Monteleoni (2016a), we try to investigate how and when these two approaches are compatible. Namely, we show that provided the sample distribution satisfies a margin like condition (in the sense of Mammen and Tsybakov, 1999 for supervised learning), both the associated empirical risk minimizer and the output of Lloyd's algorithm provide almost optimal classification in certain cases (in the sense of Azizyan et al., 2013). Besides, we also show that they achieved fast and optimal convergence rates in terms of sample size and compression risk.
引用
收藏
页码:1 / 26
页数:26
相关论文
共 50 条
  • [41] Strong Consistency of Reduced K-means Clustering
    Terada, Yoshikazu
    SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (04) : 913 - 931
  • [42] Stable Initialization Scheme for K-Means Clustering
    XU Junling1
    2. State Key Laboratory of Software Engineering
    3. Department of Computer
    Wuhan University Journal of Natural Sciences, 2009, 14 (01) : 24 - 28
  • [43] Mahalanobis Distance Based K-Means Clustering
    Brown, Paul O.
    Chiang, Meng Ching
    Guo, Shiqing
    Jin, Yingzi
    Leung, Carson K.
    Murray, Evan L.
    Pazdor, Adam G. M.
    Cuzzocrea, Alfredo
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2022, 2022, 13428 : 256 - 262
  • [44] K-means*: Clustering by gradual data transformation
    Malinen, Mikko I.
    Mariescu-Istodor, Radu
    Franti, Pasi
    PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
  • [45] Online K-Means Clustering with Lightweight Coresets
    Low, Jia Shun
    Ghafoori, Zahra
    Leckie, Christopher
    AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 191 - 202
  • [46] Scalable Multiple Kernel k-means Clustering
    Lu, Yihang
    Xin, Haonan
    Wang, Rong
    Nie, Feiping
    Li, Xuelong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4279 - 4283
  • [47] Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral
    Wahyuningrum, Tenia
    Khomsah, Siti
    Suyanto, Suyanto
    Meliana, Selly
    Yunanto, Prasti Eko
    Al Maki, Wikky F.
    2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [48] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
    Banerjee, Shreya
    Choudhary, Ankit
    Pal, Somnath
    2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
  • [49] Performance of Johnson-Lindenstrauss Transform for k-Means and k-Medians Clustering
    Makarychev, Konstantin
    Makarychev, Yury
    Razenshteyn, Ilya
    PROCEEDINGS OF THE 51ST ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '19), 2019, : 1027 - 1038
  • [50] The LINEX Weighted k-Means Clustering
    Ahmadzadehgoli, Narges
    Mohammadpour, Adel
    Behzadi, Mohammad Hassan
    JOURNAL OF STATISTICAL THEORY AND APPLICATIONS, 2019, 18 (02): : 147 - 154