Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data

被引:51
作者
Xie, Ting [1 ]
Liu, Ruihua [2 ]
Wei, Zhengyuan [1 ]
机构
[1] Chongqing Univ Technol, Coll Sci, Chongqing 400054, Peoples R China
[2] Chongqing Univ Technol, Coll Artificial Intelligence, Chongqing 400054, Peoples R China
关键词
Big Data; Clustering; K-means; Feature space; NONNEGATIVE MATRIX FACTORIZATION; SHIFT; EXTRACTION;
D O I
10.2478/AMNS.2020.1.00001
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Clustering as a fundamental unsupervised learning is considered an important method of data analysis, and K-means is demonstrably the most popular clustering algorithm. In this paper, we consider clustering on feature space to solve the low efficiency caused in the Big Data clustering by K-means. Different from the traditional methods, the algorithm guaranteed the consistency of the clustering accuracy before and after descending dimension, accelerated K-means when the clustering centeres and distance functions satisfy certain conditions, completely matched in the preprocessing step and clustering step, and improved the efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed algorithm.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 38 条
[1]  
[Anonymous], 1965, ISODATA NOVEL METHOD
[2]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[3]  
Bachem O., 2016, IEEE 30 C NEUR INF P, P76
[4]  
Banerjee A, 2004, SIAM PROC S, P234
[5]   Multi-view low-rank sparse subspace clustering [J].
Brbic, Maria ;
Kopriva, Ivica .
PATTERN RECOGNITION, 2018, 73 :247-258
[6]   Large Scale Spectral Clustering Via Landmark-Based Sparse Representation [J].
Cai, Deng ;
Chen, Xinlei .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (08) :1669-1680
[7]   A comparative study of efficient initialization methods for the k-means clustering algorithm [J].
Celebi, M. Emre ;
Kingravi, Hassan A. ;
Vela, Patricio A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (01) :200-210
[8]   Mean shift: A robust approach toward feature space analysis [J].
Comaniciu, D ;
Meer, P .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) :603-619
[9]  
Dhillon I.S., 2004, P 10 ACM SIGKDD INT, p551?556, DOI [10.1145/1014052.1014118, DOI 10.1145/1014052.1014118, DOI 10.1145/1014052.101411]
[10]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175