Apache Mahout's k-Means vs. Fuzzy k-Means Performance Evaluation

被引:0
作者
Xhafa, Fatos [1 ,3 ]
Bogza, Adriana [1 ]
Caballe, Santi [2 ]
Barolli, Leonard
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Univ Oberta Catalunya, Barcelona, Spain
[3] Univ Oberta Catalunya, SmartLearn Grp, Barcelona, Spain
来源
2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS (INCOS) | 2016年
关键词
Data Mining Algorithms; Apache Mahout; Big Data; k-Means; Fuzzy k-Means; Performance; Hadoop Cluster;
D O I
10.1109/INCoS.2016.103
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the old data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
引用
收藏
页码:110 / 116
页数:7
相关论文
共 50 条
  • [21] A Theoretical Analysis of the Fuzzy K-Means Problem
    Bloemer, Johannes
    Brauer, Sascha
    Bujna, Kathrin
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 805 - 810
  • [22] A Fuzzy Clustering Algorithm Based on K-means
    Yan, Zhen
    Pi, Dechang
    ECBI: 2009 INTERNATIONAL CONFERENCE ON ELECTRONIC COMMERCE AND BUSINESS INTELLIGENCE, PROCEEDINGS, 2009, : 523 - 528
  • [23] Sorted K-Means Towards the Enhancement of K-Means to Form Stable Clusters
    Arora, Preeti
    Virmani, Deepali
    Jindal, Himanshu
    Sharma, Mritunjaya
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION AND NETWORKS, 2017, 508 : 479 - 486
  • [24] Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral
    Wahyuningrum, Tenia
    Khomsah, Siti
    Suyanto, Suyanto
    Meliana, Selly
    Yunanto, Prasti Eko
    Al Maki, Wikky F.
    2021 4TH INTERNATIONAL SEMINAR ON RESEARCH OF INFORMATION TECHNOLOGY AND INTELLIGENT SYSTEMS (ISRITI 2021), 2020,
  • [25] Performance of the K-means and fuzzy C-means algorithms in big data analytics
    Salman Z.
    Alomary A.
    International Journal of Information Technology, 2024, 16 (1) : 465 - 470
  • [26] Comparison of K-means and K-means plus plus for image compression with thermographic images
    Biswas, Hridoy
    Umbaugh, Scott E.
    Marino, Dominic
    Sackman, Joseph
    THERMOSENSE: THERMAL INFRARED APPLICATIONS XLIII, 2021, 11743
  • [27] k-means and fuzzy c-means fusion for object clustering
    Heni, Ashraf
    Jdey, Imen
    Ltifi, Hela
    2022 8TH INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT'22), 2022, : 177 - 182
  • [28] Research on Commercial Bank's Performance Appraisal Based on K-Means
    Ye, Wang
    Chong, Xiao
    Xin, Liu
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON PUBLIC ECONOMICS AND MANAGEMENT (ICPEM 2009), VOL 5: STATISTICS AND METHODOLOGY, 2009, : 335 - 339
  • [29] Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
    Ghosh, Soumi
    Dubey, Sanjay Kumar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2013, 4 (04) : 35 - 39
  • [30] Clustering Performance of an Evolutionary K-Means Algorithm
    Nigro, Libero
    Cicirelli, Franco
    Pupo, Francesco
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 9, ICICT 2024, 2025, 1054 : 359 - 369