K-medoids Clustering Based on MapReduce and Optimal Search of Medoids

被引:0
作者
Zhu, Ying-ting [1 ]
Wang, Fu-zhang [2 ]
Shan, Xing-hua [2 ]
Lv, Xiao-yan [2 ]
机构
[1] China Acad Railway Sci, Railway Technol Res Coll, Beijing, Peoples R China
[2] China Acad Railway Sci, Inst Comp Technol, Beijing, Peoples R China
来源
2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2014) | 2014年
关键词
MapReduce; k-medoids; parallel algorithm; cluster analysis; data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When there are noises and outliers in the data, the traditional k-medoids algorithm has good robustness, however, that algorithm is only suitable for medium and small data set for its complex calculation. MapReduce is a programming model for processing mass data and suitable for parallel computing of big data. Therefore, this paper proposed an improved algorithm based on MapReduce and optimal search of medoids to cluster big data. Firstly, according to the basic properties of triangular geometry, this paper reduced calculation of distances among data elements to help search medoids quickly and reduce the calculation complexity of k-medoids. Secondly, according to the working principle of MapReduce, Map function is responsible for calculating the distances between each data element and medoids, and assigns data elements to their clusters; Reduce function will check for the results from Map function, search new medoids by the optimal search strategy of medoids again, and return new results to Map function in the next MapReduce process. The experiment results showed that our algorithm in this paper has high efficiency and good effectiveness.
引用
收藏
页码:573 / 577
页数:5
相关论文
共 50 条
  • [21] An Improvement of K-Medoids Clustering Algorithm Based on Fixed Point Iteration
    Huang, Xiaodi
    Ren, Minglun
    Hu, Zhongfeng
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2020, 16 (04) : 84 - 94
  • [22] Proof-of-Activity Consensus Algorithm Based on K-Medoids Clustering
    Wang, Dong
    Jin, Chenguang
    Xiao, Bingbing
    Li, Zheng
    He, Xin
    BIG DATA RESEARCH, 2021, 26
  • [23] Comparative Analysis between K-Means and K-Medoids for Statistical Clustering
    Arbin, Norazam
    Suhaimi, Nur Suhailayani
    Mokhtar, Nurul Zafirah
    Othman, Zalinda
    2015 THIRD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, MODELLING AND SIMULATION (AIMS 2015), 2015, : 117 - 121
  • [24] A Novel Spatial Clustering with Obstacles Constraints Based on PNPSO and K-Medoids
    Zhang, Xueping
    Du, Haohua
    Yang, Tengfei
    Zhao, Guangcai
    ADVANCES IN SWARM INTELLIGENCE, PT 2, PROCEEDINGS, 2010, 6146 : 476 - +
  • [25] Approximate Shortest Distance Computing Using k-Medoids Clustering
    Agarwal S.
    Mehta S.
    Annals of Data Science, 2017, 4 (4) : 547 - 564
  • [26] Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
    Schubert, Erich
    Rousseeuw, Peter J.
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2019), 2019, 11807 : 171 - 187
  • [27] FEATURE SELECTION WITH A SUPERVISED SIMILA RITY-BASED K-MEDOIDS CLUSTERING
    Ouyang, Chen-Sen
    PROCEEDINGS OF 2014 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2014, : 562 - 566
  • [28] Shape Clustering Using K-Medoids in Architectural Form Finding
    Yousif, Shermeen
    Yan, Wei
    COMPUTER-AIDED ARCHITECTURAL DESIGN: HELLO, CULTURE, CAAD FUTURES 2019, 2019, 1028 : 459 - 473
  • [29] PAMAE: Parallel k-Medoids Clustering with High Accuracy and Efficiency
    Song, Hwanjun
    Lee, Jae-Gil
    Han, Wook-Shin
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1087 - 1096
  • [30] An Attributes Similarity-Based K-Medoids Clustering Technique in Data Mining
    G. Surya Narayana
    D. Vasumathi
    Arabian Journal for Science and Engineering, 2018, 43 : 3979 - 3992