On the quality of k-means clustering based on grouped data

被引：2

作者：

Kaeaerik, Meelis ^{[1
]}

Paerna, Kalev ^{[1
]}

机构：

[1] Univ Tartu, Inst Stat Math, EE-50090 Tartu, Estonia

来源：

JOURNAL OF STATISTICAL PLANNING AND INFERENCE | 2009年 / 139卷 / 11期

关键词：

Grouped data; k-Means; Lloyd's algorithm; Loss-function; Voronoi partitions; QUANTIZATION;

D O I：

10.1016/j.jspi.2009.05.021

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Let us have a probability distribution P (possibly empirical) on the real line R. Consider the problem of finding the k-mean of P. i.e. a set A of at most k points that minimizes given loss-function. It is known that the k-mean can be found using an iterative algorithm by Lloyd [1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129-136]. However, depending on the complexity of the distribution P. the application of this algorithm can be quite resource-consuming. One possibility to overcome the problem is to group the original data and calculate the k-mean on the basis of the grouped data. As a result, the new k-mean will be biased, and our aim is to measure the loss of the quality of approximation caused by such approach. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：3836 / 3841

页数：6

共 50 条

[1] Authentication of uncertain data based on k-means clustering
Unver, Levent
Gundem, Taflan I.
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2016, 24 (04) : 2910 - 2928
[2] Soil data clustering by using K-means and fuzzy K-means algorithm
Hot, Elma
Popovic-Bugarin, Vesna
2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
[3] IMPROVEMENT IN K-MEANS CLUSTERING ALGORITHM FOR DATA CLUSTERING
Rajeswari, K.
Acharya, Omkar
Sharma, Mayur
Kopnar, Mahesh
Karandikar, Kiran
1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 367 - 369
[4] The fast clustering algorithm for the big data based on K-means
Xie, Ting
Zhang, Taiping
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
[5] A Novel K-Means based Clustering Algorithm for Big Data
Sinha, Ankita
Jana, Prasanta K.
2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1875 - 1879
[6] An Improved K-means Clustering Method based on Data Field
Xu, Cui
Liu, Yuhua
Xu, Ke
INTERNATIONAL CONFERENCE ON CONTROL SYSTEM AND AUTOMATION (CSA 2013), 2013, : 454 - 459
[7] A k-means based clustering algorithm
Bloisi, Domenico Daniele
Locchi, Luca
COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
[8] K-Means Extensions for Clustering Categorical Data
Alwersh, Mohammed
Kovacs, Laszlo
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 492 - 507
[9] K-means*: Clustering by gradual data transformation
Malinen, Mikko I.
Mariescu-Istodor, Radu
Franti, Pasi
PATTERN RECOGNITION, 2014, 47 (10) : 3376 - 3386
[10] A Missing Data Complement Method Based on K-means Clustering Analysis
Shi, Pengjia
Zhang, Linyao
2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,

← 1 2 3 4 5 →