On the quality of k-means clustering based on grouped data

被引:2
|
作者
Kaeaerik, Meelis [1 ]
Paerna, Kalev [1 ]
机构
[1] Univ Tartu, Inst Stat Math, EE-50090 Tartu, Estonia
关键词
Grouped data; k-Means; Lloyd's algorithm; Loss-function; Voronoi partitions; QUANTIZATION;
D O I
10.1016/j.jspi.2009.05.021
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P. i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-136]. However, depending on the complexity of the distribution P. the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3836 / 3841
页数:6
相关论文
共 50 条
  • [1] Authentication of uncertain data based on k-means clustering
    Unver, Levent
    Gundem, Taflan I.
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (04) : 2910 - 2928
  • [2] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [3] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
    Rajeswari, K.
    Acharya, Omkar
    Sharma, Mayur
    Kopnar, Mahesh
    Karandikar, Kiran
    1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
  • [4] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [5] A Novel K-Means based Clustering Algorithm for Big Data
    Sinha, Ankita
    Jana, Prasanta K.
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1875 - 1879
  • [6] An Improved K-means Clustering Method based on Data Field
    Xu, Cui
    Liu, Yuhua
    Xu, Ke
    INTERNATIONAL CONFERENCE ON CONTROL SYSTEM AND AUTOMATION (CSA 2013), 2013, : 454 - 459
  • [7] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
  • [8] K-Means Extensions for Clustering Categorical Data
    Alwersh, Mohammed
    Kovacs, Laszlo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
  • [9] K-means*: Clustering by gradual data transformation
    Malinen, Mikko I.
    Mariescu-Istodor, Radu
    Franti, Pasi
    PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
  • [10] A Missing Data Complement Method Based on K-means Clustering Analysis
    Shi, Pengjia
    Zhang, Linyao
    2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,