K-medoids Clustering Based on MapReduce and Optimal Search of Medoids

被引:0
作者
Zhu, Ying-ting [1 ]
Wang, Fu-zhang [2 ]
Shan, Xing-hua [2 ]
Lv, Xiao-yan [2 ]
机构
[1] China Acad Railway Sci, Railway Technol Res Coll, Beijing, Peoples R China
[2] China Acad Railway Sci, Inst Comp Technol, Beijing, Peoples R China
来源
2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2014) | 2014年
关键词
MapReduce; k-medoids; parallel algorithm; cluster analysis; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When there are noises and outliers in the data, the traditional k-medoids algorithm has good robustness, however, that algorithm is only suitable for medium and small data set for its complex calculation. MapReduce is a programming model for processing mass data and suitable for parallel computing of big data. Therefore, this paper proposed an improved algorithm based on MapReduce and optimal search of medoids to cluster big data. Firstly, according to the basic properties of triangular geometry, this paper reduced calculation of distances among data elements to help search medoids quickly and reduce the calculation complexity of k-medoids. Secondly, according to the working principle of MapReduce, Map function is responsible for calculating the distances between each data element and medoids, and assigns data elements to their clusters; Reduce function will check for the results from Map function, search new medoids by the optimal search strategy of medoids again, and return new results to Map function in the next MapReduce process. The experiment results showed that our algorithm in this paper has high efficiency and good effectiveness.
引用
收藏
页码:573 / 577
页数:5
相关论文
共 50 条
[31]   Improved Initialization Method for Simple and Fast K-medoids Clustering [J].
Kim, Sung-Soo ;
Kang, Bum-Su .
INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2023, 22 (01) :63-72
[32]   Bearing performance degradation assessment based on Renyi entropy and K-medoids clustering [J].
Zhang L. ;
Song C. ;
Zou Y. ;
Hong C. ;
Wang C. .
Zhendong yu Chongji/Journal of Vibration and Shock, 2020, 39 (20) :24-31and46
[33]   An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining [J].
G. Surya Narayana ;
D. Vasumathi .
Arabian Journal for Science and Engineering, 2018, 43 :3979-3992
[34]   An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining [J].
Narayana, G. Surya ;
Vasumathi, D. .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2018, 43 (08) :3979-3992
[35]   k-MM: A Hybrid Clustering Algorithm Based on k-Means and k-Medoids [J].
Drias, Habiba ;
Cherif, Nadjib Fodil ;
Kechid, Amine .
ADVANCES IN NATURE AND BIOLOGICALLY INSPIRED COMPUTING, 2016, 419 :37-48
[36]   K-Medoids and K-Means Clustering in High School Teacher Distribution [J].
Widiyaningtyas, Triyanna ;
Pujianto, Utomo ;
Prabowo, Martin Indra Wisnu .
2019 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND INFORMATION ENGINEERING (ICEEIE), 2019, :330-335
[37]   k-Medoids Clustering Based on Kernel Density Estimation and Jensen-Shannon Divergence [J].
Hamasuna, Yukihiro ;
Kingetsu, Yuto ;
Nakano, Shusuke .
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE (MDAI 2019), 2019, 11676 :272-282
[38]   Application of the k-medoids Partitioning Algorithm for Clustering of Time Series Data [J].
Radovanovic, Ana ;
Ye, Xinlin ;
Milanovic, Jovica, V ;
Milosavljevic, Nina ;
Storchi, Riccardo .
2020 IEEE PES INNOVATIVE SMART GRID TECHNOLOGIES EUROPE (ISGT-EUROPE 2020): SMART GRIDS: KEY ENABLERS OF A GREEN POWER SYSTEM, 2020, :645-649
[39]   Advancing the PAM Algorithm to Semi-supervised k-Medoids Clustering [J].
Janosova, Miriama ;
Lang, Andreas ;
Budikova, Petra ;
Schubert, Erich ;
Dohnal, Vlastislav .
SIMILARITY SEARCH AND APPLICATIONS, SISAP 2024, 2025, 15268 :223-237
[40]   Identification method of site micro-vibration source based on K-medoids clustering [J].
Zhang J. ;
Yu C. ;
Wang P. ;
Sun C. .
Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2022, 43 (11) :113-122